[Esip-preserve] A Minor Note on Nomenclature

Tue Sep 21 15:27:52 EDT 2010

I'll suggest you put this material together as a draft
paper and aim to have a final versions for 
submission about a week after the January 
ESIP meeting.

I'm currently working on a paper to deal with
how to identify whether two data collections
contain the same data, using Annex E of the
OAIS RM as a starting point.  There are at least
three complications:
1.  Numeric data (as well as other kinds) have
a variety of different internal representations
[a particulalry unpleasant example is a "date and
time stamp", although one could also make a case
for unpleasantness champ as being character strings,
including whether upper case is different from lower
case when dealing with scientific content, e.g. is
CO2 different than Carbon Dioxide or CARBON DIOXIDE]
2.  Numeric data can be stored in different orders
(e.g. FORTRAN array order vs C array order) - as well
as grouped in different aggregations (MOD02 holding
of lat-long values, while the radiometric data are 
scattered in several products)
3.  Tacit data that influence the order of arrays, but
aren't written down.

We've also got some pleasantries related to data collections
that are "in production".  The NOAA GHCN data, for example,
are apparently having new data appended to the files on a
routine basis, perhaps as frequently as once a month or so.
Likewise, we need to decide if character strings that are
apparently erroneous in an archived form should be incorporated
in whether a copy with the errors corrected is "identical"
with the original when we're dealing with scientific identity.
We may also have to decide whether one should only refer to the
whole MODIS "data set" or whether the "snow and ice product" is
actually a "data set".

More after I get a new router at home.

Bruce B.
----- Original Message -----
From: "Curt Tilmes" <Curt.Tilmes at nasa.gov>
To: esip-preserve at lists.esipfed.org
Sent: Monday, September 20, 2010 11:49:53 AM
Subject: Re: [Esip-preserve] A Minor Note on Nomenclature

On 08/25/10 08:14, alicebarkstrom at verizon.net wrote:

> It strikes me that this discussion can be made quite useful and
> relevant if we move away from a discussion of "verbal categories"
> and convert it into a collection of lists or "database table
> contents".

Ok.

> For example, it seems like we've got a pretty concrete idea of what
> we mean by production history. That's sort of represented by four
> tables:
> 1: Table of Files
> 2: Table of Jobs
> 3: Table of which Files were input into which jobs
> 4: Table of which Files were output from which jobs
> You could attach the time and circumstances of production
> to the table of jobs.

That's the core of course, but I think even our general model needs a
little more structure than that.  Especially for the "industrial
production" model, documenting the regularity will help organize the
vast quantity of information a bit.

I'll try to take a stab at generalizing a bit from our model and send
it out.

> Second, it looks like we've identified a "Custodianship Table" that
> contains such information as which facility ran the job, who was on
> duty or authorized the job, and so on. This needs some work, but
> should be doable in a fairly short period of time. It would also be
> helpful to have an informal use case that says something about the
> circumstances under which people might access this table, how they'd
> use it, how often we think they might use it and so on.

> Third, it looks like we're headed toward defining the contents of an
> audit. This begins to impinge on what we might call "authenticity
> verification". That could broaden the discussion considerably, but
> if we could sort of identify what we think we'd want to see in an
> audit report, it could be quite useful, particularly if we could put
> that information into a table that could serve as a checklist for
> project submission agreement completeness.

> Fourth, I think it would be very useful to create a list of the
> documents or reports of context information, who is expected to
> create these documents, and the circumstances of use, it could be
> very useful - and, indeed, the sooner we could get such a list
> published, the better off the community would be.

> Rather than spending time arguing about verbal categories, it seems
> to me compiling these kinds of tables and making sure we have a
> clear identification of their contents and use would be far more
> productive. In slightly different language, it would be useful to
> think of our role as "information architects".

You are right, of course.  I'll try to get some bits of my model on
the Wiki as a strawman you can poke at.

Curt
_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve