[Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado

Julia Collins collinsj at nsidc.org
Thu Feb 16 17:53:55 EST 2017


Hi,

On Wed, 15 Feb 2017, Neal McBurnett via Bessig wrote:
> You say not to worry about data already in repositories, and I
> certainly hope that's true.  Can you point to the relevant policies or
> whatever that will help the activists understand your perspective
> (despite their their fear of high-level attempts to delete such data)
> and save everybody time?

That is a reasonable request, and an interesting exercise! There is
information on the (as we know, impermanent) web. For example, NSIDC is
one of the EOSDIS DAACs. The DAACs are "custodians of EOS mission data
and ensure that data will be easily accessible to users." See:

   https://earthdata.nasa.gov/about/daacs

The NSF requires sharing of research results:
   https://www.nsf.gov/bfa/dias/policy/dmp.jsp
   https://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp

As do other agencies:

https://science.energy.gov/funding-opportunities/digital-data-management/
https://www2.usgs.gov/datamanagement/plan/dmplans.php

It would be more satisfactory to find executive orders and budget line
items, though, so I'll keep looking for that information. Meanwhile,
organizations like the Sunlight Foundation
(https://sunlightfoundation.com) focus on open (governmental) data
access, and may have some insights regarding the stability of the US
data holdings both historically and now.  In general, I think it will be
more effective to work with existing organizations that have been in the
trenches with these issues for a while, rather than spreading the energy
around too much. I do understand the feeling of wanting to do *something
now*, though.

> Re: the other unmanaged data, that you're concerned about, can you
> provide any pointers to help volunteers know what sorts of data you're
> talking about, and might in principle at least expect some help with?

Primarily I'm thinking of the datasets Ruth referred to: Information on
old media that hasn't been converted; data "archived" in spreadsheet
that never makes it off a PI's laptop; data stored in an archive without
adequate metadata to read the file(s) and interpret them; data that
*are* archived but are not machine-readable because they're stored in a
proprietary format. If there are data- and technology-savvy volunteers
interested in helping to curate contributions to data centers (e.g.,
converting the data files to self-describing formats if needed,
quality-checking data and metadata, improving the tools available to
parse (meta)data and store it in databases or other indexed data stores,
etc.), then data centers should get with the program and start
advertising volunteer opportunities. :-)

Regarding the study:
> [...]
> Curation of Scientific Data at Risk of Loss: Data Rescue and Dissemination - Academic Commons
> https://academiccommons.columbia.edu/catalog/ac:206975
> [...] This data was
> held by the US Geological Survey (USGS) National Biological
> Information Infrastructure (NBII), which was terminated by the US
> government in early 2012.

One salient point here is that the NBII was terminated. Another salient
point is that this didn't happen overnight. From Wikipedia (if you trust
it):

    On October 3, 2011, USGS announced on all NBII websites and
    applications that on January 15, 2012, the NBII website and any
    applications residing on the nbii.gov domain would be shut down, and
    it was. Before that shutdown, the Library of Congress, Internet
    Archive and Stanford Libraries all independently harvested the data
    from the NBII Website.[11] Stanford Libraries harvested the site
    twice between January 5 and January 13, 2012 for storage in its
    Fugitive US Agencies collection.

    -- https://en.wikipedia.org/wiki/National_Biological_Information_Infrastructure

These discussions can be helpful in identifying what it is we're really
afraid of. Is it that the NASA missions and NSF funding for basic
research will be reduced (or eliminated) in the next budget, tossing
them into the trash bin with NBII? Is it that new directors of these
organizations will be appointed who will order the destruction of data?
Or do we fear a police state in which the data really *are* destroyed
overnight? I suggest that one way to prevent the first two possibilities
is to ensure that your Congressional representatives know and understand
the importance of these programs and these data. If your organization
has an outreach component, work with them to ensure that lawmakers do
get the message, at both the State and Federal level. (I don't want to
think about the police state scenario just yet.) If it makes you feel
better to create data backups, do it. Just don't do it instead of
communicating with the people who were elected to represent you and who
might actually be able to help us prevent data disasters.

Julia


More information about the Bessig mailing list