[Esip-preserve] Some Suggestions on Provenance Work

alicebarkstrom at frontier.com alicebarkstrom at frontier.com
Tue Sep 21 15:10:30 EDT 2010


My home router failed not long after
looking at my e-mails yesterday.  Our
Internet service provider has promised
to send a new one - after three to five
business days.  I'm currently working
at the Weaverville, NC, public library,
which has a timed reservation system for
access to the Internet.

I'll fill in the spreadsheet after we
get the new router installed and our access
updated.

As a brief note, I think it's fairly clear
that there are two production modes: high-throughput
production, which is typical of operational data
sources and many of the large-scale climate data
records (including many of the NASA EOS data collections);
ad hoc or exploratory production, which is often driving
by the interests of individual users.  The first has
very rigid production graphs; the second has substantial
variety - and is probably only organized in the sense
that there are probably statistical patterns to the graph.

More than a decade ago, I had used a tool like the workflow
engines to do exploratory work for instrument calibration and
validation.  The interface was not as visual as some of the
current tools, but there were four basic functions:
- select a subset of parameters to work with
- select a subset of record values from a time series of
records
- visualize the selected values - either as a single variable
plot or as a two-dimensional plot
- transform selected values with functions like those available
on a scientific calculator
- perform some simple statistical summaries of selected values,
such as finding means, standard deviations, or doing linear
regressions
These could be done in any order and had no limits on the number
of iterations that could be used.  Also, the history of the interaction
could be stored, edited, and reused.

>From a characterization of a session, it is clear that each
process is like a character in a string.  Thus, one could do
a string matching approach to characterizing the statistical
patterns.  I did a bit of this, but didn't publish the work.

I'd suggest putting together a formal paper for submission to
a journal, such as Earth Science Informatics.  That would probably
be more valuable to the ESIP community than a transient collection
of presentation slides.  If the characterization work has gotten
to the point of being able to produce a summary by the January
meeting, that would also be useful for a joint session with 
the Preservation Cluster - at least in my opinion.

Bruce B.
----- Original Message -----
From: "Hook Hua (388C)" <hook.hua at jpl.nasa.gov>
To: "Curt Tilmes" <Curt.Tilmes at nasa.gov>, esip-preserve at lists.esipfed.org
Sent: Monday, September 20, 2010 1:48:06 PM
Subject: Re: [Esip-preserve] Some Suggestions on Provenance Work


Regarding the comments on workflow and provenance: the Services Interoperability and Orchestration subgroup in the NASA TechInfusion Working Group has been working on a comparison of popular workflows used in Earth science. We’ve also started adding facets on provenance as well and are looking for more community input. We want to assess the impact of various workflows, their interoperability, and their support of provenance. 

NASA Technology Infusion Working Group (TIWG): Workflow Comparison 2010 
https://spreadsheets.google.com/ccc?key=0AlQ95ca89UmYdEZJcXVYdGIxMkotSGwxcFA3OFJYenc&hl=en#gid=0 

We want to get more community input on their experiences with workflows and provenance. The spreadsheet is open to anyone to edit and add their input. If we get enough useful input from the community, we will show this as a poster at ESDSWG meeting. 

This is continuation work from the ESIP talk on “Workflow Engines: Why So Many?” 
http://wiki.esipfed.org/index.php/Workflow_Engines:_Why_So_Many%3F 

--Hook 



From: Curt Tilmes < Curt.Tilmes at nasa.gov > 
Date: Mon, 20 Sep 2010 08:28:24 -0700 
To: < esip-preserve at lists.esipfed.org > 
Cc: Rahul Ramachandran < rramachandran at itsc.uah.edu > 
Subject: Re: [Esip-preserve] Some Suggestions on Provenance Work 

On 09/03/10 11:08, alicebarkstrom at verizon.net wrote: 
> I sent the following note to Dr. Ramachandran before the 
> teleconference on provenance tracking (or at least the parts 
> identified in 1 through 3). At that point, I felt that we did not 
> need a new working group and still feel that way. However, here are 
> some work items on provenance that I think need to be picked up: 

> 1. There are a number of workflow and provenance tracking tools, 
> including Earth science workbench (Frew), Sciflow, Kepler, Taverna, 
> and others. It might be useful to prepare an intercomparison of 
> these tools - particularly whether they are intended primarily for 
> ad hoc (or exploratory) data production or whether they might be 
> adapted to use on the high-throughput production paradigms, as well 
> as what kind of "database" technology they use (relational, XML, 
> flat file, RDF, triple store, etc.) You could regard this as a 
> preliminary form of marketing analysis, where it would be useful to 
> ask how many different kinds of Earth science data have been run 
> through the tool, how robust it is, and how much it will cost to 
> buy, adapt, and run. I suspect this would be a useful paper if it 
> can be done in a reasonable length of time (say less than six months 
> to submission). 

Hook Hua did a very nice overview of workflow engines in an ESIP 
Webinar.  More info, including his slides are here: 

     http://wiki.esipfed.org/index.php/Workflow_Engines:_Why_So_Many%3F 

Another pass at this using his work as a base could result in a nice 
paper. 





_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


More information about the Esip-preserve mailing list