I attended a session at the Texas Conference on Digital Libraries this
past week. Included in the presentation was a project by the Texas
Digital Library to build a DSpace OAI-PMH harvester that would ingest
OAI-ORE documents [1]. Seems like a v. neat project -- I believe TDL
plans to put this into production very soon. The harverster can be
configured to save either links to aggregated resources included in
the ReM (in which case the DSpace end user will be given a link to the
aggregated resource at its original URL) OR to actually ingest that
aggregated resource, creating a new record for it in DSpace. This
project was built using the Atom serialization format of OAI-ORE.
I would very much like to see this functionality extended just a bit
to allow for AtomPub-based posting of ReMs into a DSpace
installation. All that would be requried would be a servlet that
would accept a "POST" with mime-type "application/atom
+xml;type=entry." Which could then pass along the Atom Entry ReM to
the same code that knows how ot ingest a ReM as part of the harvesting
process (with, of course, whatever validation need occur). In
addition, such functionality could be exposed and made discoverable by
an Atom Service document that listed the correct end-point for the
AtomPub OAI-ORE interface, and even a link@rel=service in the main
collection web page for easy machine discoverability.
I've built a number of such AtomPub interfaces in PHP and Python, and
the hardest part is parsing that atom entry and creating the
appropriate mappings to local metadata (all of which is already done
in the TDL project). Other than that, it's quite easy. Anyone have
any thoughts on the usefulness of such a thing and/or the difficulty
of creating the code for DSpace (my java chops are too rusty to have a
good sense of that).
I realize that the Sword protocol has some of the same functionality
and is built on AtomPub. But SWORD (as far as I know) knows nothing
of the Atom serialization of OAI-ORE. Of course I am sure that the
SWORD code has most of the spare parts that would be needed to get
this AtomPub OAI-ORE interface working.
I'd love to see such a thing happen, since my department has many 10s
of thousands of papers we'd like to put in our library's Dspace
installation. Posting Atom-based ORE resource maps would be a clean
and simple solution.
On Sat, May 30, 2009 at 5:43 PM, pkeane <pjke...@gmail.com> wrote:
> I attended a session at the Texas Conference on Digital Libraries this > past week. Included in the presentation was a project by the Texas > Digital Library to build a DSpace OAI-PMH harvester that would ingest > OAI-ORE documents [1].
I'm pretty sure that the TDL project included a piece to create
resource maps and publish them in OAI-PMH (I'm not 100% certain on
that, but I do know the harvester assumes resource maps are placed in
OAI-PMH). I'll also note that they have used the Atom serialization
for the resource maps (w/ the assumption they can move to RDF-XML at
some point).
On Wed, Jun 3, 2009 at 7:09 AM, Robert Sanderson <azarot...@gmail.com> wrote:
> Do they also expose their content via ORE? I can't find it anywhere if they
> do. :(
> --Rob
> On Sat, May 30, 2009 at 5:43 PM, pkeane <pjke...@gmail.com> wrote:
>> I attended a session at the Texas Conference on Digital Libraries this
>> past week. Included in the presentation was a project by the Texas
>> Digital Library to build a DSpace OAI-PMH harvester that would ingest
>> OAI-ORE documents [1].
This work does involve mapping DSpace collections/objects to OAI-ORE
for exchange of the data. It was one of my favorite presentations at
OR09.
The abstract is here:
On Wed, Jun 03, 2009 at 07:15:33AM -0500, Peter Keane wrote:
> Hi Rob-
> I'm pretty sure that the TDL project included a piece to create
> resource maps and publish them in OAI-PMH (I'm not 100% certain on
> that, but I do know the harvester assumes resource maps are placed in
> OAI-PMH). I'll also note that they have used the Atom serialization
> for the resource maps (w/ the assumption they can move to RDF-XML at
> some point).
> --peter
> On Wed, Jun 3, 2009 at 7:09 AM, Robert Sanderson <azarot...@gmail.com> wrote:
> > Do they also expose their content via ORE??? I can't find it anywhere if they
> > do. :(
> > --Rob
> > On Sat, May 30, 2009 at 5:43 PM, pkeane <pjke...@gmail.com> wrote:
> >> I attended a session at the Texas Conference on Digital Libraries this
> >> past week. ??Included in the presentation was a project by the Texas
> >> Digital Library to build a DSpace OAI-PMH harvester that would ingest
> >> OAI-ORE documents [1].
I think Alexey can comment on if there is a Packager exposed for SWORD/LNI that can take the next step of exposing that capability for external agents. I took Alexey's presentation to mean they avoided SWORD/LIN ingest issue by having the target Repository be running the agent internally. I actually think thats an important point. Because the big question is whos in charge of deciding what gets put into the repository other than its maintainers, and in their case its the maintainer, not a 3rd party as int he SWORD case.
I think it would be good to direct this to Alexey as well because he's the developer of the Harvester, he can comment on it further. Thus I've CC'd him.
----
For me, your question challenges me to think about how this will relate in DSpace 2.0 where our data model becomes more flexible and we stop thinking about content in DSpace as "Items" in "Collections". In fact, I've left behind my original work to map ORE to DSpace Items/Collections/Communities explicitly because DSpace 2.0 drops the entire rigid model in favor of a simplified entity-relationship modeling approach where "type" is just a property and "containership" is just a relation. In this case, any URI (URI-R, URI-A, URI-AR, URI-P) expressed in the ReM becomes an Entity in DSpace and its properties that are NonLiteral references become "relations".
But... This still comes back to my original debate about "Content Types" vs "ORE", in DSpace 2.0, We are trying to define "profiles" for Entities in the DCMI DescriptionSet Profile sense of the term, thus, DCMI Application Profiles can be encoded in the DSpace 2.0 Metadata Registry of DSpace 2.0 and used as templates to validate and build different "Profiles" of Composite Digitial Objects, my intention is that its the choice of the repository designer how rigid the influence of these profiles will be on the content expressed in the repository.
So, how does this relate to what your asking? Alexey's approach of "making the repository the agent" still puts the job of creating that mapping on the repository maintainer, an already overtaxed and struggling group of archivists and librarians who want tools to make their lives easier... not harder. However, in the DSpace Community, Aaron Zeckoski and a GSoC student are actually working on an interface for interacting with the DSpace Entities via the traditional REST approaches (not APP specific package submission, I.E. not constrained by SWORD or APP or even Atom). I dare to say the OAI-ORE community should be considering how a simple protocol like REST applies to OAI-ORE, how might an agent to basically interact with the repository on an atomic level to construct the composite digital object by the "playing" of PUT/POST commands containing fragments of ORE ReMs or any other simple REST fragments. This is much different than making the repository responsible for providing such a mapping (forcing DSpace core developers to supply ingest packager support over and over again as the next greatest "standard" becomes popular). This vaccinates the DSpace repository maintainers against the disease of YAMMOSES (Yet Another Manifest Mapping Of Someone Else's Standard) rampant in our community.
On Sat, May 30, 2009 at 9:43 AM, pkeane <pjke...@gmail.com> wrote:
> Hi All-
> I attended a session at the Texas Conference on Digital Libraries this > past week. Included in the presentation was a project by the Texas > Digital Library to build a DSpace OAI-PMH harvester that would ingest > OAI-ORE documents [1]. Seems like a v. neat project -- I believe TDL > plans to put this into production very soon. The harverster can be > configured to save either links to aggregated resources included in > the ReM (in which case the DSpace end user will be given a link to the > aggregated resource at its original URL) OR to actually ingest that > aggregated resource, creating a new record for it in DSpace. This > project was built using the Atom serialization format of OAI-ORE.
> I would very much like to see this functionality extended just a bit > to allow for AtomPub-based posting of ReMs into a DSpace > installation. All that would be requried would be a servlet that > would accept a "POST" with mime-type "application/atom > +xml;type=entry." Which could then pass along the Atom Entry ReM to > the same code that knows how ot ingest a ReM as part of the harvesting > process (with, of course, whatever validation need occur). In > addition, such functionality could be exposed and made discoverable by > an Atom Service document that listed the correct end-point for the > AtomPub OAI-ORE interface, and even a link@rel=service in the main > collection web page for easy machine discoverability.
> I've built a number of such AtomPub interfaces in PHP and Python, and > the hardest part is parsing that atom entry and creating the > appropriate mappings to local metadata (all of which is already done > in the TDL project). Other than that, it's quite easy. Anyone have > any thoughts on the usefulness of such a thing and/or the difficulty > of creating the code for DSpace (my java chops are too rusty to have a > good sense of that).
> I realize that the Sword protocol has some of the same functionality > and is built on AtomPub. But SWORD (as far as I know) knows nothing > of the Atom serialization of OAI-ORE. Of course I am sure that the > SWORD code has most of the spare parts that would be needed to get > this AtomPub OAI-ORE interface working.
> I'd love to see such a thing happen, since my department has many 10s > of thousands of papers we'd like to put in our library's Dspace > installation. Posting Atom-based ORE resource maps would be a clean > and simple solution.
> --Peter Keane > The University of Texas at Austin
They wrote a metadata crosswalk that created the ReM, and exposes it via
oai-pmh.... awesome presentation so if Alexey can post it somewhere? The
Dspace wiki perhaps?
On Wed, Jun 3, 2009 at 9:13 AM, Simeon Warner <arxivsim...@gmail.com> wrote:
> This work does involve mapping DSpace collections/objects to OAI-ORE
> for exchange of the data. It was one of my favorite presentations at
> OR09.
> The abstract is here:
> but it seems that slides from OR09 presentations aren't up yet. I cc
> Alexey who can perhaps point to slides?
> Cheers,
> Simeon
> On Wed, Jun 03, 2009 at 07:15:33AM -0500, Peter Keane wrote:
> > Hi Rob-
> > I'm pretty sure that the TDL project included a piece to create
> > resource maps and publish them in OAI-PMH (I'm not 100% certain on
> > that, but I do know the harvester assumes resource maps are placed in
> > OAI-PMH). I'll also note that they have used the Atom serialization
> > for the resource maps (w/ the assumption they can move to RDF-XML at
> > some point).
> > --peter
> > On Wed, Jun 3, 2009 at 7:09 AM, Robert Sanderson <azarot...@gmail.com>
> wrote:
> > > Do they also expose their content via ORE??? I can't find it anywhere
> if they
> > > do. :(
> > > --Rob
> > > On Sat, May 30, 2009 at 5:43 PM, pkeane <pjke...@gmail.com> wrote:
> > >> I attended a session at the Texas Conference on Digital Libraries this
> > >> past week. ??Included in the presentation was a project by the Texas
> > >> Digital Library to build a DSpace OAI-PMH harvester that would ingest
> > >> OAI-ORE documents [1].
You're saying that we should consider what happens if you PUT/POST a Resource Map? Isn't that up to the recipient of the operation to determine? ORE is a data model, not a protocol with expected server behaviour, and especially not for create rather than retrieve.
On Wed, Jun 3, 2009 at 9:20 AM, Robert Sanderson <azarot...@gmail.com> wrote:
> Mark,
> You're saying that we should consider what happens if you PUT/POST a
> Resource Map? Isn't that up to the recipient of the operation to
> determine? ORE is a data model, not a protocol with expected server
> behaviour, and especially not for create rather than retrieve.
That's exactly the reason I've been so interested in the Atom
serialization of OAI-ORE -- since AtomPub IS a protocol, if you have
an atom:entry ReM you have a protocol on which to base the function of
PUT/POST operations. Of course it leaves the burden of a mapping (as
Mark said) on the repository owner, but that's been addressed in the
TDL PMH/ORE harvester.
Certainly there would need to be an HTTP-based authentication for
posting, but the AtomPub functionality would need only a AtomPub
service doc describing the endpoint, and a simple AtomPub handler for
POST/PUT.
I suspect our use case is not atypical: UT College of Liberal Arts is
providing faculty with an interface to upload papers into a "Faculty
sandbox" for departmental websites. We'd like a way to programmatical
place a copy in the library's IR (DSpace). ORE/AtomPub seems natural.
On Wed, Jun 3, 2009 at 8:13 AM, Simeon Warner <arxivsim...@gmail.com> wrote:
> This work does involve mapping DSpace collections/objects to OAI-ORE
> for exchange of the data. It was one of my favorite presentations at
> OR09.
> The abstract is here:
> On Wed, Jun 03, 2009 at 07:15:33AM -0500, Peter Keane wrote:
>> Hi Rob-
>> I'm pretty sure that the TDL project included a piece to create
>> resource maps and publish them in OAI-PMH (I'm not 100% certain on
>> that, but I do know the harvester assumes resource maps are placed in
>> OAI-PMH). I'll also note that they have used the Atom serialization
>> for the resource maps (w/ the assumption they can move to RDF-XML at
>> some point).
>> --peter
>> On Wed, Jun 3, 2009 at 7:09 AM, Robert Sanderson <azarot...@gmail.com> wrote:
>> > Do they also expose their content via ORE??? I can't find it anywhere if they
>> > do. :(
>> > --Rob
>> > On Sat, May 30, 2009 at 5:43 PM, pkeane <pjke...@gmail.com> wrote:
>> >> I attended a session at the Texas Conference on Digital Libraries this
>> >> past week. ??Included in the presentation was a project by the Texas
>> >> Digital Library to build a DSpace OAI-PMH harvester that would ingest
>> >> OAI-ORE documents [1].