Next in my list of OAI-ORE practicalities questions:
As part of our work on the Preserving Virtual Worlds project, I'm
working on how to transfer a packaged up game between the Univ. of
Illinois and Stanford. For various reasons, Stanford would like to
receive all data/metadata for the package using the BagIt
specification. I've got a bunch of metadata in OAI-ORE that
identifies the various digital assets I want to go in the package
delivered to Stanford (the game itself, representation information for
the game, context information for the game, provenance information for
all of the above) as well as the relationships between the assets (not
only OAI-ORE relationships, but FRBR and OAIS relationships as well,
e.g., this asset is semantic representation information for that
asset). Being nicely formed OAI-ORE, all references to assets are
protocol-based URIs.
The problem comes when I want to put this all in BagIt (this being the
digital assets and the OAI-ORE files), tar and gzip the whole caboodle
and ship it to them. I don't want the OAI-ORE referencing the copies
of the assets at my site. In fact, for a couple of reasons (the most
salient being I have to dark archive some of this material), I can't
make it available on the public web server. I want the OAI-ORE
document to reference the copies of the assets in the BagIt package
using file:/// URIs. But that's not a protocol-based URI, is it? And
so, not well-formed OAI-ORE.
My solution space for this at the moment seems to be: 1. ignore the
OAI-ORE requirement for protocol-based URIs and use file:/// URIs to
reference digital assets in the BagIt directory hierarchy; 2. go to a
certain amount of time and trouble instituting a one-time-use
authentication mechanism that insures that only a designated archivist
at Stanford can get at the restricted assets, and use BagIt fetch.txt
to reference them; or 3. Base 64 encode the digital assets, and treat
them as literals in the OAI-ORE RDF expressions. Can't say I'm
thrilled about any of those options, but #1 probably has the most
appeal to someone who A. doesn't want to engage in additional
transformations of the underlying assets (ie Base 64) and B. is
congenitally lazy.
My questions: 1. Am I missing some obvious fourth option in the
solution space; and 2. Was there any official discussion/
recommendation of how to use OAI-ORE with something like a tarball of
files to ship content between repository sites?
On Thu, 2009-08-20 at 13:46 -0700, Jerome wrote:
> Howdy,
> Next in my list of OAI-ORE practicalities questions:
> As part of our work on the Preserving Virtual Worlds project, I'm
> working on how to transfer a packaged up game between the Univ. of
> Illinois and Stanford. For various reasons, Stanford would like to
> receive all data/metadata for the package using the BagIt
> specification. I've got a bunch of metadata in OAI-ORE that
> identifies the various digital assets I want to go in the package
> delivered to Stanford (the game itself, representation information for
> the game, context information for the game, provenance information for
> all of the above) as well as the relationships between the assets (not
> only OAI-ORE relationships, but FRBR and OAIS relationships as well,
> e.g., this asset is semantic representation information for that
> asset). Being nicely formed OAI-ORE, all references to assets are
> protocol-based URIs.
> The problem comes when I want to put this all in BagIt (this being the
> digital assets and the OAI-ORE files), tar and gzip the whole caboodle
> and ship it to them. I don't want the OAI-ORE referencing the copies
> of the assets at my site. In fact, for a couple of reasons (the most
> salient being I have to dark archive some of this material), I can't
> make it available on the public web server. I want the OAI-ORE
> document to reference the copies of the assets in the BagIt package
> using file:/// URIs. But that's not a protocol-based URI, is it? And
> so, not well-formed OAI-ORE.
> My solution space for this at the moment seems to be: 1. ignore the
> OAI-ORE requirement for protocol-based URIs and use file:/// URIs to
> reference digital assets in the BagIt directory hierarchy; 2. go to a
> certain amount of time and trouble instituting a one-time-use
> authentication mechanism that insures that only a designated archivist
> at Stanford can get at the restricted assets, and use BagIt fetch.txt
> to reference them; or 3. Base 64 encode the digital assets, and treat
> them as literals in the OAI-ORE RDF expressions. Can't say I'm
> thrilled about any of those options, but #1 probably has the most
> appeal to someone who A. doesn't want to engage in additional
> transformations of the underlying assets (ie Base 64) and B. is
> congenitally lazy.
> My questions: 1. Am I missing some obvious fourth option in the
> solution space; and 2. Was there any official discussion/
> recommendation of how to use OAI-ORE with something like a tarball of
> files to ship content between repository sites?
> Next in my list of OAI-ORE practicalities questions:
> As part of our work on the Preserving Virtual Worlds project, I'm > working on how to transfer a packaged up game between the Univ. of > Illinois and Stanford. For various reasons, Stanford would like to > receive all data/metadata for the package using the BagIt > specification. I've got a bunch of metadata in OAI-ORE that > identifies the various digital assets I want to go in the package > delivered to Stanford (the game itself, representation information for > the game, context information for the game, provenance information for > all of the above) as well as the relationships between the assets (not > only OAI-ORE relationships, but FRBR and OAIS relationships as well, > e.g., this asset is semantic representation information for that > asset). Being nicely formed OAI-ORE, all references to assets are > protocol-based URIs.
> The problem comes when I want to put this all in BagIt (this being the > digital assets and the OAI-ORE files), tar and gzip the whole caboodle > and ship it to them. I don't want the OAI-ORE referencing the copies > of the assets at my site. In fact, for a couple of reasons (the most > salient being I have to dark archive some of this material), I can't > make it available on the public web server. I want the OAI-ORE > document to reference the copies of the assets in the BagIt package > using file:/// URIs. But that's not a protocol-based URI, is it? And > so, not well-formed OAI-ORE.
> My solution space for this at the moment seems to be: 1. ignore the > OAI-ORE requirement for protocol-based URIs and use file:/// URIs to > reference digital assets in the BagIt directory hierarchy; 2. go to a > certain amount of time and trouble instituting a one-time-use > authentication mechanism that insures that only a designated archivist > at Stanford can get at the restricted assets, and use BagIt fetch.txt > to reference them; or 3. Base 64 encode the digital assets, and treat > them as literals in the OAI-ORE RDF expressions. Can't say I'm > thrilled about any of those options, but #1 probably has the most > appeal to someone who A. doesn't want to engage in additional > transformations of the underlying assets (ie Base 64) and B. is > congenitally lazy.
> My questions: 1. Am I missing some obvious fourth option in the > solution space; and 2. Was there any official discussion/ > recommendation of how to use OAI-ORE with something like a tarball of > files to ship content between repository sites?
I am not an OAI-ORE expert (or even a particularly well-informed amateur), but I do have some knowledge of Bagit and Pairpath (mentioned later in the thread).
I think there are two problems here. Laying the groundwork, you are transferring an object between site A and site B. The first problem is that because you are using OAI-ORE, either site A or site B needs to lay claim to the URIs that will describe the object.
The second problem is getting the data from site A to site B.
I don’t think that you can get around the first problem. One site needs to take responsibility for managing the URIs. This doesn’t necessarily involve making them dereferenceable (at least not immediately).
The second is not necessarily related to the first. If you choose to use http://sitea/object that does not mean that site B needs to use HTTP to transfer that object from site B.
What site A and site B do need to do is agree on way of mapping http://sitea/object to some bytestream (representation).
I think what Ben was suggesting is that you can use pairpath to do provide a mapping between an HTTP URI and a path on a filesystem. For example, http://sitea/object would map to:
ht/tp/+=/=s/it/ea/=o/bj/ec/t
You could then “dereference” the URI http://sitea/object by generating this pairpath from it and looking inside your bag to see if it contained that path. If it does, you have the “dereferenced” representation of that URI.
In summary, just because it starts with http:// doesn’t mean you have to use HTTP to get it.
The "pairtree_prefix" contains a string that should be prepended to every identifier inferred from the pairtree rooted at "pairtree_root". This may be used to reduce path lengths when every identifier in a given pairtree shares the same initial substring. In the example above, the pairpath "/aa/cd/" would thus correspond to the identifier "http://n2t.info/ark:/13030/xt2aacd".
-----
Personally, I am quite fond of this mechanism, both for interchange and for on-disc storage - migration of self-contained objects (book page scan collections for example) is made easier, as you might only need to change the prefix file.
On Fri, 2009-08-21 at 17:15 -0700, Erik Hetzner wrote: > At Thu, 20 Aug 2009 13:46:02 -0700 (PDT), > Jerome wrote: > > Howdy,
> > Next in my list of OAI-ORE practicalities questions:
> > As part of our work on the Preserving Virtual Worlds project, I'm > > working on how to transfer a packaged up game between the Univ. of > > Illinois and Stanford. For various reasons, Stanford would like to > > receive all data/metadata for the package using the BagIt > > specification. I've got a bunch of metadata in OAI-ORE that > > identifies the various digital assets I want to go in the package > > delivered to Stanford (the game itself, representation information for > > the game, context information for the game, provenance information for > > all of the above) as well as the relationships between the assets (not > > only OAI-ORE relationships, but FRBR and OAIS relationships as well, > > e.g., this asset is semantic representation information for that > > asset). Being nicely formed OAI-ORE, all references to assets are > > protocol-based URIs.
> > The problem comes when I want to put this all in BagIt (this being the > > digital assets and the OAI-ORE files), tar and gzip the whole caboodle > > and ship it to them. I don't want the OAI-ORE referencing the copies > > of the assets at my site. In fact, for a couple of reasons (the most > > salient being I have to dark archive some of this material), I can't > > make it available on the public web server. I want the OAI-ORE > > document to reference the copies of the assets in the BagIt package > > using file:/// URIs. But that's not a protocol-based URI, is it? And > > so, not well-formed OAI-ORE.
> > My solution space for this at the moment seems to be: 1. ignore the > > OAI-ORE requirement for protocol-based URIs and use file:/// URIs to > > reference digital assets in the BagIt directory hierarchy; 2. go to a > > certain amount of time and trouble instituting a one-time-use > > authentication mechanism that insures that only a designated archivist > > at Stanford can get at the restricted assets, and use BagIt fetch.txt > > to reference them; or 3. Base 64 encode the digital assets, and treat > > them as literals in the OAI-ORE RDF expressions. Can't say I'm > > thrilled about any of those options, but #1 probably has the most > > appeal to someone who A. doesn't want to engage in additional > > transformations of the underlying assets (ie Base 64) and B. is > > congenitally lazy.
> > My questions: 1. Am I missing some obvious fourth option in the > > solution space; and 2. Was there any official discussion/ > > recommendation of how to use OAI-ORE with something like a tarball of > > files to ship content between repository sites?
> I am not an OAI-ORE expert (or even a particularly well-informed > amateur), but I do have some knowledge of Bagit and Pairpath > (mentioned later in the thread).
> I think there are two problems here. Laying the groundwork, you are > transferring an object between site A and site B. The first problem is > that because you are using OAI-ORE, either site A or site B needs to > lay claim to the URIs that will describe the object.
> The second problem is getting the data from site A to site B.
> I don’t think that you can get around the first problem. One site > needs to take responsibility for managing the URIs. This doesn’t > necessarily involve making them dereferenceable (at least not > immediately).
> The second is not necessarily related to the first. If you choose to > use http://sitea/object that does not mean that site B needs to use > HTTP to transfer that object from site B.
> What site A and site B do need to do is agree on way of mapping > http://sitea/object to some bytestream (representation).
> I think what Ben was suggesting is that you can use pairpath to do > provide a mapping between an HTTP URI and a path on a filesystem. For > example, http://sitea/object would map to:
> ht/tp/+=/=s/it/ea/=o/bj/ec/t
> You could then “dereference” the URI http://sitea/object by generating > this pairpath from it and looking inside your bag to see if it > contained that path. If it does, you have the “dereferenced” > representation of that URI.
> In summary, just because it starts with http:// doesn’t mean you have > to use HTTP to get it.