[Esip-preserve] [FOO] Bob reproduces Alice's research

alicebarkstrom at frontier.com alicebarkstrom at frontier.com
Fri Oct 8 15:15:30 EDT 2010


This will get interesting when Bob and Alice discover
they have been using different versions of the commercial
software used to run the algorithms - and that the software
has new versions of the key algorithms that produce slightly
different results.  Lots of useful variants on this theme
are possible.

Bruce B.
----- Original Message -----
From: "Curt Tilmes" <Curt.Tilmes at nasa.gov>
To: "ESIP Preservation cluster" <esip-preserve at rtpnet.org>
Sent: Friday, October 8, 2010 12:11:18 PM
Subject: [Esip-preserve] [FOO] Bob reproduces Alice's research

Bob has been reading Alice's paper, and notices a weird bump in the
graphs around month 10 (due to the corrupt data).

He tries to reproduce her process, following the methodology she
documented with her paper.

Following the cited data DOIs referenced in her paper, he downloads
     FOOL3.v1.01.d08925b3-3eb3-407d-8db1-f5e0d101a0a4 and
     FOOL3.v2.01.2a365058-fb52-4559-ab4b-085cb5ac0b73

and performs the analysis.  Unsuprisingly, since he is using the
revised data, he gets different answers from Alice.

He gives her a call, and they work out that she actually used

     FOOL3.v1.01.d08925b3-3eb3-407d-8db1-f5e0d101a0a4
     FOOL3.v2.01.07aa9ae3-9c3e-4508-b027-890dae11b768

Ok, says Bob, I'll just go back to the archive and get those files.
They aren't available, but he does see that since the metadata and
provenance were preserved, he can get the software and reproduce those
granules.

He gets the old L0 file and calibration file and produces some new
files:

FOOL1B.v2.10.c911b994-91fb-4d5c-b9e1-642c0a9c46a3
   Used: FOOL0.10.0b337185-82af-4662-89b0-419bfd3e5db7 (old, corrupt file)
         FOOCAL.2.d2a14052-f426-4d2e-a506-ec052fdb69d4

FOOL2.v2.10.2c09ed89-57cf-40ed-910b-16c1aafcd947
   Used: FOOL1B.v2.10.c911b994-91fb-4d5c-b9e1-642c0a9c46a3

FOOL3.v2.01.52562fbd-5969-4572-a757-47ff3f92dda4
   Used:
     FOOL2.v2.01.bba34792-f256-4c54-81dd-9977e432c204
     FOOL2.v2.02.2fd12da6-a3e2-4e50-8140-3ac645882419
     FOOL2.v2.03.29bda893-765d-476d-851b-8b9acd7f140e
     FOOL2.v2.04.57509ddb-3d40-4d60-8204-da4b99867fc7
     FOOL2.v2.05.0e8604fa-fb4e-4cfb-b412-5364ca12cf14
     FOOL2.v2.06.0eb26b4e-b718-41c5-bbf8-c83d3d79c233
     FOOL2.v2.07.43079ea6-43b5-4622-b492-bcdb824a818e
     FOOL2.v2.08.590fd64c-ec12-44a5-9b14-0042d19ed3dc
     FOOL2.v2.09.226173b9-4ef7-49e8-8b9e-701b892a8f57
     FOOL2.v2.10.2c09ed89-57cf-40ed-910b-16c1aafcd947
     FOOL2.v2.11.af235d11-777c-4bf1-a5e6-15273a5e5d80
     FOOL2.v2.12.bdc9dc33-38bd-403c-991e-48dcd4762ca7


(For now, let's assume he can perfectly reproduce the environment and
run the processing in an identical manner.  Big assumption, but a
different issue than what I'm trying to highlight here.  We'll revisit
that in a later episode.)


Note that each of the files Bob produces has a different UUID, and
different provenance.

If, for example, you asked about

     FOOL1B.v2.10.2f269e5e-cce7-41e4-8a83-baad1e087c8e

the provenance would say that the main data processing system produced
that file on a certain date/time on a certain host, while the file
Bob made:

     FOOL1B.v2.10.c911b994-91fb-4d5c-b9e1-642c0a9c46a3

was made by Bob, at an entirely different time on a different host.

The key thing though is that they each used the same software and the
same input files and were run in the same way.

In this world, let's accept that
     FOOL1B.v2.10.2f269e5e-cce7-41e4-8a83-baad1e087c8e
is scientifically equivalent to
     FOOL1B.v2.10.c911b994-91fb-4d5c-b9e1-642c0a9c46a3

Similarly:
     FOOL2.v2.10.2c09ed89-57cf-40ed-910b-16c1aafcd947
is scientifically equivalent to
     FOOL2.v2.10.533b2a95-d57f-4f75-9b7d-914d3d220310
and
     FOOL3.v2.01.52562fbd-5969-4572-a757-47ff3f92dda4
is scientifically equivalent to
     FOOL3.v2.01.07aa9ae3-9c3e-4508-b027-890dae11b768

So, Bob can now use these two granules:
     FOOL3.v1.01.d08925b3-3eb3-407d-8db1-f5e0d101a0a4
     FOOL3.v2.01.52562fbd-5969-4572-a757-47ff3f92dda4

and follow Alice's methodology and reproduce the research in her
paper.

Curt
_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


More information about the Esip-preserve mailing list