[Esip-preserve] FYI: The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing

Lynnes, Christopher S. (GSFC-6102) christopher.s.lynnes at nasa.gov
Fri Dec 20 11:25:44 EST 2013


<data_preservation_sacrilege>
1) Data preservation is expensive.
2) Ergo, not all data can be preserved indefinitely.
3) Ergo, there should be a prioritization step.
4) Priority should be based on:
  (a) Whether the data can be remeasured or reconstructed at a later date
  (b) Whether the data has been supplanted by better data
  (c) The cost of remeasuring/reconstructing data
  (d) How interesting/useful the data are

Examples:
Based on (a) observations of the state of the Earth at a given time would go to high priority.
>From (b), recent versions of the data should enable us to discard (most) previous versions
>From (c), Level 1A data are typically discarded, as they are easy to reconstruct from the Level 0.
>From (d), my Master's thesis paleomagetism results are likely not worth tossing the UM paleomagnetism lab for my hand samples, because the results were...less interesting and useful than hoped.
</data_preservation_sacrilege>

On Dec 20, 2013, at 11:03 AM, Bruce Barkstrom <brbarkstrom at gmail.com<mailto:brbarkstrom at gmail.com>> wrote:

To carry this further, before embarking on what will likely
be a time-expensive project, it would be useful to at least
get some idea of the work involved - and the potential
benefit.

The key is the PLOS paper on bioinformatics that estimated
it would take 280 or so hours for reconstructing a fairly simple
result.  100 hours was the amount of time required to become
familiar with the original material.  In software cost estimates,
163 person-hours is the amount of work force available in
one person-month.  So that simple project had about two-thirds
of a person-month devoted to understanding of what was done
before.  I expect most satellite projects will require much more
work.

So - we first need to say what we're expecting to accomplish
with a replication of an old data collection.  Which of the following
are we going to try to do:
1.  Find and inventory old data.
2.  1 plus documentation and source code that would allow
   for calibration (Level 1) of the original data
3.  2 plus emulating or working around gaps caused by hardware
   and software obsolescence [which might also need an estimate
   of the time and expertise required to supply missing data,
   physics involved in the instrument, as well as locating such
   items as calibration coefficients or spectral responses]
4.  3 plus reconstructing the complete data production environment
   for higher level products
5.  4 plus intercomparison between reconstructed data and any
   residue of previously created data collections

[This list could probably be revised to become an attribute
attached to the intent of a reconstruction project.]

Next, it should be moderately straightforward to lay out a
work breakdown structure for whichever option we might
select.

After we'd done that, it might be appropriate to have a formal
review of such a proposed project.

It seems to me that's sort of the minimum required for good
project management.

Bruce B.


On Fri, Dec 20, 2013 at 10:41 AM, Wei, Jennifer C. (GSFC-610.2)[ADNET SYSTEMS INC] <jennifer.c.wei at nasa.gov<mailto:jennifer.c.wei at nasa.gov>> wrote:
Hi Curt,

We (GES DISC) are currently undergoing satellite data preservation, especially for those decommissioned satellites, such as UARS, TOMS, HIRDLS, etc.  I am recently get involved with the task.  What we have encountered are not only the data were saved on the old magnetic tapes or even on floppy discs, but those old data were written in the old machine-based binary form, which we don’t have the machine to read them so we can transform them into the modern language.

Maybe one of the data preservation is to come up a way to add metadat (xml, or ancillary information) for the old observation data, so they can be machine-readable for future use.  I have seen this need not only  in the binary raw data, but also in the current in-situ measurements saved in the simple text files.  Another is what is the “supporting documentation” for future people use?

Earlier this year at one of NSF EarthCube workshops, a lot of earth scientists had also addressed this issue/concern.  I think it would be nice to see ESIP take lead on this.

Thanks
Jennifer
--
Dr. Jennifer Wei
ADNET Systems, Inc.

GES DISC Code 610.2
NASA Goddard Space Flight Center
Greenbelt, MD. 20771

Phone: (301) 614-6558<tel:%28301%29%20614-6558>
Email: jennifer.c.wei at nasa.gov<http://jennifer.c.wei@nasa.gov/>




Tilmes, Curt (GSFC-6190) On 12/20/13, 9:20 AM, "Tilmes, Curt (GSFC-6190)" <curt.tilmes at nasa.gov<http://curt.tilmes@nasa.gov/>> wrote:

Shocking News!

The Vast Majority of Raw Data From Old Scientific Studies May Now Be Missing

"One of the foundations of the scientific method is the reproducibility of results. In a lab anywhere around the world, a researcher should be able to study the same subject as another scientist and reproduce the same data, or analyze the same data and notice the same patterns.

This is why the findings of a study published today in Current Biology are so concerning. When a group of researchers tried to email the authors of 516 biological studies published between 1991 and 2011 and ask for the raw data, they were dismayed to find that more 90 percent of the oldest data (from papers written more than 20 years ago) were inaccessible. In total, even including papers published as recently as 2011, they were only able to track down the data for 23 percent."

http://blogs.smithsonianmag.com/science/2013/12/the-vast-majority-of-raw-data-from-old-scientific-studies-may-now-be-missing/

We've talked about doing such a study for the Earth Sciences -- I think such a study would shine a light on our problems..  Who's up for it?

Curt

_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org<mailto:Esip-preserve at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org<mailto:Esip-preserve at lists.esipfed.org>
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve

--
Dr. Christopher Lynnes     NASA/GSFC, Code 610.2    phone: 301-614-5185
"Perfection is achieved, not when there is nothing left to add, but when there is nothing left to take away" -- A. de Saint-Exupery






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131220/c3881932/attachment.html>


More information about the Esip-preserve mailing list