[Esip-preserve] Some Thoughts on OPM
Curt Tilmes
Curt.Tilmes at nasa.gov
Fri Dec 10 12:07:58 EST 2010
On 12/10/2010 11:51 AM, Bruce Barkstrom wrote:
> Eventually, we're going to have to do some thinking about the
> scaling that goes with this approach. As far as I can tell, the
> scaling for traversing the graph is still linear with the number of
> nodes. If all the granules in ESDIS get included, we're going to
> have several hundred million items, including files and jobs - not
> to mention the possibility of subsets (fragments) of files.
You're right, of course. This will be a big challenge..
Sometimes I think it would be great to have a huge triple store that
just pulls in everything we care about and can query it directly with
SPARQL, but I think that isn't feasible (or at least won't be for some
time).
I think we can partition nicely along the
Dataset = { Collection, ESDT }
boundaries though. (Collection still bothers be though -- it isn't as
concrete as the ArchiveSet model we use internally)
Each Dataset has a "home" -- an Archive responsible for its curation
and stewardship, they could offer a URL into which the persistent
identifier for that Dataset (DOI) will point, and they could also
offer (or point to elsewhere) a SPARQL end point with the graph of
related nodes. When you get to a point where you are referring to
another dataset owned by another archive, you hop over to their SPARQL
end point and continue the query.
As a single archive grows bigger and bigger, they can just paritition
internally along Dataset boundaries as much as needed, offering
multiple databases.
Getting back to "Collection", we need the ability to broaden it beyond
a single archive. Currently every collection of a specific ESDT is
always owned by the same archive (if the old ones are even kept at
all, which is another issue) For this scheme to be scalable, we need
the ability for other archives to handle the same types of data,
Whether they must change the ESDT, or have a controlled, extended
namespace for Collection, or something different.
Curt
More information about the Esip-preserve
mailing list