[Esip-preserve] Citations guideline revisions

Tue Jul 26 09:04:01 EDT 2011

OK - again, we stumble on mental models.  Mark assumes citations only need point
to file collections to provide credit for work done in creating a
collection.  I assume citations
are necessary for replication of results and therefore need to be done
for individual files.  The
collection citation approach could induce its own pleasantries, such
as GSFC wanting the
citation to point to all MODIS files in MODAPS or GES, whereas NSIDC
only wants the
MODIS citation to go to the NSIDC collection of MODIS files.  These
might well be the same
files (replicated).  Of course, maybe NSIDC would keep some files that
GSFC got rid of
- which would be useful.  In the long run, if NSIDC decided they
needed to reformat their
collection to avoid obsolescence, whereas GSFC didn't or used a
different reformatting,
citation could be interesting.  Both citations could still point to
scientifically equivalent
data.  [An equivalent case could be made for a Word document where one
copy is in
.doc format and another copy is in .docx - or .pdf.  Are they really
different and do they
deserve different citations?]

As a minor correction to this note, in my mental model, a "version" of
a collection
is usually a collection whose instances of time sampling are the same as those
of a previous version but with different errors.  An analogy is a new
edition of a
printed encyclopedia - except that there are no new articles in the new edition,
so the new edition contains exactly the same articles, but with
revised contents.
I'm not sure what Greg's original comment assumed about how many files
there might
be in a "version" nor how a new "version" relates to the old one.

Bruce B.

On Tue, Jul 26, 2011 at 8:41 AM, Mark A. Parsons <parsonsm at nsidc.org> wrote:
> I just meant that you may need multiple identifiers for different purposes. I thought that was a central conclusion of the Cluster's identifier paper that was just published. Meanwhile, you need to pick one collection-level identifier for citation. That identifier can support multiple resolution, but typically it wouldn't be pointing to individual files but rather a data set home page or some such. There are multiple considerations to address when choosing your citation identifier. One is whether the publisher will use it. Another is whether you are allowing the publishers too much control as Bruce fears. Archives need to weigh those considerations and then recommend an approach to their users.
>
> All said, I don't think any of this conversation changes the guideline which says use a locator in your citation and then further notes that publishers like DOIs, but please correct me if I am wrong and modify the guidelines accordingly.
>
> -m.
>
> Sent from my iPad. Pardon my brevity.
>
> On Jul 26, 2011, at 6:27 AM, Bruce Barkstrom <brbarkstrom at gmail.com> wrote:
>
>> "Multiple resolution" is what's needed.  A particular archive can actually have
>> multiple copies of a file (one in "deep storage", another on tape, a
>> third on disk
>> for rapid access, and a fourth being staged for production).  More importantly,
>> data files with different formats can actually be scientifically
>> identical and stored
>> in different locations.  One example of replication of files to
>> different storage
>> locations is the NOAA CLASS archive which (last time I had checked) puts
>> duplicate copies in separate locations - and the NOAA data centers might
>> also choose to put copies in offsite locations as well.
>>
>> Bruce B.
>>
>> On Tue, Jul 26, 2011 at 2:27 AM, Greg Janée <gjanee at icess.ucsb.edu> wrote:
>>> The Handle and DOI systems support "multiple resolution" which can be used
>>> for, among other things, describing the multiple locations at which the
>>> object may be found.
>>>
>>> I don't know how often this capability is used in practice, but multiple
>>> resolution would seem to be a great help in thinking of an identifier as
>>> identifying an abstract object (e.g., a version of a dataset)  for which
>>> there may be varying numbers of copies in existence at any given time.
>>>
>>> Regarding Mark's comment, is it ever desirable for an object to have more
>>> than one persistent identifier?  If it takes some amount of awareness and
>>> responsibility and effort to maintain one identifier over time, doesn't that
>>> burden get multipled N times if there are N identifiers?  And then there's
>>> the diluting effect of having more than one identifier, which causes
>>> confusion (which identifier should I use?), plays havoc with citation
>>> counting and search system ranking, etc.
>>>
>>> -Greg
>>>
>>> On Jul 25, 2011, at 12:02 PM, Mark A. Parsons wrote:
>>>>
>>>> Yes, use as many identifiers as you like, but you should probably only use
>>>> one in a citation. The publishers would probably prefer that be a DOI (at
>>>> the moment at least).
>>>>
>>>> Cheers,
>>>>
>>>> -m.
>>>>
>>>> On 25 Jul 2011, at 12:22 PM, Bruce Barkstrom wrote:
>>>>>
>>>>> One question that I don't think we've addressed is whether having a
>>>>> single
>>>>> source of redirection will decrease the probability of losing information
>>>>> due
>>>>> to the loss of multi-site replication.  Going to the multi-identifier
>>>>> approach
>>>>> would be more consistent with multi-site distribution of locators.
>>>>>
>>>>> Bruce b.
>>>
>>> _______________________________________________
>>> Esip-preserve mailing list
>>> Esip-preserve at lists.esipfed.org
>>> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>>
>