[Esip-preserve] Provenance Issues

Curt Tilmes Curt.Tilmes at nasa.gov
Thu Oct 18 14:29:43 EDT 2012


On 10/18/2012 11:21 AM, Bruce Barkstrom wrote:
> 1.  Do we have a recommended practice on how to document choices of
> algorithms made automatically by computer programs during data
> reduction?

I wouldn't go so far as "recommended" yet, but here is a possible way
to represent algorithms chosen as an input to an activity being
assigned a "prov:type" of "algorithm".

(Representing the method of choosing may also be possible with the new
version of PML..)

Activity "myreduction1" uses "algorithm1" to transform "mydata" into
"myreduceddata1", and activity "myreduction2" uses "algorithm2" to
transform "mydata" into "myreduceddata2".

(Handgenerated, untested, may have typos...)

In W3C PROV-N:

entity(algorithm1)
entity(algorithm2)
entity(mydata)
entity(myreduceddata1)
entity(myreduceddata2)

activity(myreduction1)
used(myreduction1, algorithm1, [prov:type="algorithm"])
used(myreduction1, mydata)
wasGeneratedBy(myreduceddata1, myreduction1)

activity(myreduction2)
used(myreduction2, algorithm2, [prov:type="algorithm"])
wasGeneratedBy(myreduceddata2, myreduction2)

In PROV-O:

@prefix prov:    <http://www.w3.org/ns/prov#> .
@prefix :        <http://example.com/> .

:algorithm1 a prov:Entity .
:algorithm2 a prov:Entity .
:mydata a prov:Entity .
:myreduceddata1 a prov:Entity.
:myreduceddata2 a prov:Entity.

:myreduction1
   a prov:Activity;
   prov:used :algorithm1;
   prov:qualifiedUsage [
     a prov:Usage;
     prov:entity :algorithm1;
     prov:type "algorithm"
   ];
   prov:generated :myreduceddata1;
.

:myreduction2
   a prov:Activity;
   prov:used :algorithm2;
   prov:qualifiedUsage [
     a prov:Usage;
     prov:entity :algorithm2;
     prov:type "algorithm"
   ];
   prov:generated :myreduceddata2;
.

In PROV-XML:

<?xml version="1.0" encoding="UTF-8"?>

<prov:document
     xmlns:prov="http://www.w3.org/ns/prov#"
     xmlns:ex="http://example.com/ns/ex#">

   <prov:entity prov:id="ex:algorithm1" />
   <prov:entity prov:id="ex:algorithm2" />
   <prov:entity prov:id="ex:mydata" />
   <prov:entity prov:id="ex:myreduceddata1" />
   <prov:entity prov:id="ex:myreduceddata2" />

   <prov:activity prov:id="ex:myreduction1" />
   <prov:used>
     <prov:activity prov:ref="ex:myreduction1" />
     <prov:entity prov:ref="ex:algorithm1" />
     <prov:type>algorithm</prov:type>
   </prov:used>
   <prov:used>
     <prov:activity prov:ref="ex:myreduction1" />
     <prov:entity prov:ref="ex:mydata" />
   </prov:used>
   <prov:wasGeneratedBy>
     <prov:entity prov:ref="ex:myreduceddata1" />
     <prov:activity prov:ref="ex:myreduction1" />
   </prov:wasGeneratedBy>

   <prov:activity prov:id="ex:myreduction2" />
   <prov:used>
     <prov:activity prov:ref="ex:myreduction2" />
     <prov:entity prov:ref="ex:algorithm2" />
     <prov:type>algorithm</prov:type>
   </prov:used>
   <prov:used>
     <prov:activity prov:ref="ex:myreduction2" />
     <prov:entity prov:ref="ex:mydata" />
   </prov:used>
   <prov:wasGeneratedBy>
     <prov:entity prov:ref="ex:myreduceddata2" />
     <prov:activity prov:ref="ex:myreduction2" />
   </prov:wasGeneratedBy>

</prov:document>

> 2.  How should we deal with documenting changing histories of data
> reduction, including recommended formula and coefficient changes?

It would be easier with a more concrete example, but I'll give it a
shot..  I think you may have a case for Specialization.  That allows
you to refer to the data in general, and also specific instances of
that data over time.

entity(somedata)
entity(specificdata1)
entity(specificdata2)

specializationOf(specificdata1, somedata)
specializationOf(specificdata2, somedata)

Then you can document the formula and coefficent changes for
specificdata1:

entity(someformula1, [ prov:value="f(x) = x * coefficientoffoobar" ])
entity(somecoefficient1, [ prov:value=27 ])
activity(makedata1)
used(makedata1, someformula, [ prov:type="formula" ])
used(makedata1, somecoefficient, [ prov:type="coefficientoffoobar" ])
wasGeneratedBy(specificdata1, makedata1)

or specificdata2:

entity(someformula2, [ prov:value="f(x) = x^2 * coefficientoffoobar" ])
entity(somecoefficient2, [ prov:value=53 ])
activity(makedata2)
used(makedata2, someformula2, [ prov:type="formula" ])
used(makedata2, somecoefficient2, [ prov:type="coefficientoffoobar" ])
wasGeneratedBy(specificdata2, makedata2)

(PROV-O and PROV-XML representations an exercise for the reader)

If you need to know the value of the "coefficientoffoobar" for the
data, you look backwards and pose the question "what was the value of
the entity of type 'coefficientoffoobar' used by the activity that
generated the data?"

> 3.  The new Computer journal from the IEEE has several articles on
> Dynamic Software Product Lines.  In systems that use this kind of
> technology, the designers plan on variation points and allow runtime
> reconfiguration of the modules.  Have we had any thoughts on what
> this model of computation does to provenance tracking?

You just have to define entities for each individual instantiation and
document whatever aspects of the reconfiguration are relevant.

Curt


More information about the Esip-preserve mailing list