[Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado

Grace Peng grace at ucar.edu
Wed Feb 15 19:27:56 EST 2017


Folks,

You brought up great points.

I'm also worried that, if a lot of volunteers try to
simultaneously access and 'save' our data,
it will be indistinguishable from a DoS attack.

rda.ucar.edu is already stretched to serve all our users
at peak times.

Grace


On Wed, Feb 15, 2017 at 1:22 PM, Julia Collins via Bessig <
bessig at lists.esipfed.org> wrote:

> My delay in responding has paid off, since Lynn and Cathy have captured the
> points that I would have attempted to make. :-) I will not be able to
> make the February BESSIG happy hour, but will offer a few observations
> that can perhaps serve as additional happy hour food for thought.
>
> A Lynn notes, there is already a community of data professionals who are
> trying to curate and preserve data.  Data that are currently managed in
> repositories are backed up and will not disappear overnight. Data that
> are *not* managed won't be findable in the context of a hackathon --
> some of us already spend a lot of time trying to figure out how to catch
> datasets before they fall through the cracks.  The silver lining of all
> of the recent administrative upheaval is that now a lot of people are
> thinking about data access -- and the value of data -- in ways they
> hadn't before.  This gives us a chance to increase the visibility of
> data management issues that have long existed.  For example, data
> management funding resources have never matched the needs. A better use
> of Hackathon brain power would be identify strategies for harnessing the
> current fears regarding loss of data access into constructive steps for
> increasing the long-term funding for data management.  Also, as Cathy
> alluded to, we are in real danger of losing some of our observational
> systems as satellites age without plans to replace them. It's easy to
> get swept up in the dramatic headlines, but those are not the
> fundamental problems that we need to address to ensure long-term data
> storage and access.
>
> Julia
> --
> Julia Collins
>
>
> On Wed, 15 Feb 2017, Yarmey, Lynn Rees via Bessig wrote:
>
> Hi Neal and all,
>>
>> This thread is so timely as the topic (literally) kept me up last
>> night… I am looking forward to debating this with you all tomorrow at
>> the BESSIG Happy Hour in Golden! :)
>>
>> FWIW, while I completely believe in the good intentions and talent of
>> the gathered hackathon folks, the PPEH descriptions call for ‘hackers’
>> to ‘figure out how to capture the uncrawlable data’ (
>> http://www.ppehlab.org/datarefugepaths ), with no further definition.
>> I would note that in addition to usage logging (for instance the NASA
>> EarthData login) to demonstrate data center impact, additional
>> crawling and access restrictions include privacy protections on social
>> science and individually-identifiable related data, ownership
>> considerations in indigenous knowledge, as well as legal protections
>> on endangered species locations, etc.  I know some data is
>> crawler-limited because it was deprecated in favor of a new version. I
>> would feel more confident if the website descriptions were more
>> nuanced and clear, though I am very happy that data-aware people will
>> be in the Boulder event for these reasons!
>>
>> More broadly - While I initially was really excited that data rescue
>> was finally (finally!) getting some huge visibility and props, I am
>> now more hesitant. The direction of the Data Refuge hackathons (again,
>> well intentioned and valuable in raising awareness of data and
>> passion!) seems to be glossing over the expertise and insight of the
>> professional data community. The curators of data being ‘rescued’ are
>> by and large not being contacted/credited, and it is not clear that
>> the good data practices that have been refined over years are being
>> acknowledged much less accurately implemented. For example, PPEHLab
>> website talks about metadata creation rather than capture which makes
>> me nervous.  Also, and IMO, the Internet Archive does not have a
>> strong understanding of the requirements for data handling,
>> preservation, storage, access, or discovery.  If all .gov content
>> somehow disappeared tomorrow (unlikely as noted), would folks be able
>> to discover, access, understand, and use Data Refuge-saved data?
>>
>> Moreover, in times of increased govt scrutiny of funding, if a bunch
>> of volunteers get together and ‘save all of NASA’s data’ (for example)
>> in a weekend does this support the critical funding of long-term data
>> centers and data curation experts?
>>
>> All of that said, I may well just be paranoid! All the better to
>> discuss over a drink tomorrow :)
>> Lynn
>>
>>
>>
>> On 2/15/17, 11:33 AM, "Bessig on behalf of Neal McBurnett via Bessig" <
>> bessig-bounces at lists.esipfed.org on behalf of bessig at lists.esipfed.org>
>> wrote:
>>
>>    Thanks for the great input, folks.  I'm sorry for not being a bit more
>> clear.
>>
>>    Cathy, I think we just may have different reactions to the word
>> "hackathon".  The original application of the word "hack" to technology and
>> computers was at MIT and was all about creativity and not at all about
>> malicious activity, or evading proper safeguards.  See:
>>
>>    A Short History of “Hack” - The New Yorker
>>     http://www.newyorker.com/tech/elements/a-short-history-of-hack
>>
>>    A "hackathon" is a more recent term for getting folks together to do
>> good things creatively, e.g. add information to wikipedia articles, etc.
>>
>>    As always, there are some folks that comandeered the term to apply it
>> to other activities, and the media played a big role.  But that's not us.
>>
>>    So don't worry about us on Saturday.  As Daniel notes, we're focused
>> on data that has been released openly by the government, but might become
>> harder to find or even be thrown away.  The more insight we can get from
>> the sorts of experts here on the best way to track and authenticate
>> well-vetted data so we can all get reproducable results, support them with
>> data and documentation and procedures, etc, the better.
>>
>>    Cheers,
>>
>>    Neal McBurnett                 http://neal.mcburnett.org/
>>
>>    On Wed, Feb 15, 2017 at 06:06:03PM +0000, Anne Wilson via Bessig wrote:
>>    > Hi Cathy,   I totally take your point about the poor data
>> management,   And, I can see the hackathon heading towards a solution of
>>    > counting the availability of datasets, which I think is good to
>> know.   I see them as separate issues.
>>    >
>>    > Also, the hackathon did raise the awareness of issue that access to
>> datasets can slowly erode over time.   I’ve seen that happen.
>>    >  I think data about access sounds pretty important.
>>    >
>>    > Anne
>>    >
>>    > From: Bessig on behalf of Anne
>>    > Reply-To: Cathy
>>    > Date: Wednesday, February 15, 2017 at 11:01 AM
>>    > To: Daniel Ziskin
>>    > Cc: Anne
>>    > Subject: Re: [Bessig] Fwd: Preserving open government data: Data
>> Rescue event this weekend at University of Colorado
>>    >
>>    >
>>    > I understand the desire to save data but why have a hackathon?
>> Government orgs like the one I'm in can direct people to how best to
>>    > get the data and save it along with documentation. I have not seen
>> any email to our group from these efforts requesting how to get
>>    > the data, how it is documented or if we have more data we could make
>> public that isn't now. Data without any context is much less
>>    > useful.
>>    >
>>    > Webpages are a different thing and saving them makes some sense.
>> Though if no one is releasing new data about animal treatment, how
>>    > will people be able to get that with any hackathon of old data.
>>    >
>>    > Cathy
>>    >
>>    >
>>    > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>    > From: "Daniel Ziskin" <ziskin at ucar.edu>
>>    > To: "Cathy" <besthiker at comcast.net>
>>    > Cc: bessig at lists.esipfed.org
>>    > Sent: Wednesday, February 15, 2017 10:54:45 AM
>>    > Subject: Re: [Bessig] Fwd: Preserving open government data: Data
>> Rescue event this weekend at University of Colorado
>>    >
>>    > I think this is more of a political response to the possibility that
>> public access to government data will be rescinded because it
>>    > doesn't support a particular ideology. To that end, I endorse the
>> effort.
>>    >
>>    > The USDA has already shut down access to its records of animal abuse
>> by agribusiness[1] and the Dept of Education has already shut
>>    > down a website that explains protections for disabled students[2].
>> What's next?
>>    >
>>    > 1. http://news.nationalgeographic.com/2017/02/wildlife-watch-
>> usda-animal-welfare-trump-records/
>>    > 2. http://www.huffingtonpost.com/entry/devos-disabilities-web-s
>> ite_us_58a0fd7ae4b094a129ec35b8
>>    >
>>    > On Wed, Feb 15, 2017 at 9:46 AM, Cathy via Bessig <
>> bessig at lists.esipfed.org> wrote:
>>    >
>>    >     Not to be too annoying, but why are they having a hackathon to
>> get the data? They can just ask for it. Data that has not been
>>    >     preserved is likely behind firewalls and on local storage
>> devices. Accessing them in a government system is illegal. Also, even
>>    >     if you were to get it, you would probably not be able to make
>> sense of it. If someone asked for help getting data, we would
>>    >     help them in our group. And for free!
>>    >
>>    >     NCAR already archives an enormous amount of data. It is doubtful
>> that is going anywhere. And it is well documented. Other
>>    >     institutions around the world already archive public data from
>> the US.
>>    >
>>    >     The real issues with data include data that has not been fully
>> processed and made available. This includes data that is on
>>    >     tape, paper, or cd, etc. Or data that was recorded but not
>> processed and documented. Some of that is worth saving and may be at
>>    >     risk with less funding but a hackathon won't help. And, as a
>> scientist, data streams that are stopped (observing systems) are
>>    >     much more of an issue than saving already public data. This is a
>> particular issue for climate where long, consistent records
>>    >     are needed. The various hackathons out there make it sound like
>> data will be fine but it won't be.
>>    >
>>    >     Cathy
>>    >     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>    >     From: "Neal McBurnett via Bessig" <bessig at lists.esipfed.org>
>>    >     To: bessig at lists.esipfed.org
>>    >     Sent: Wednesday, February 15, 2017 9:17:08 AM
>>    >     Subject: [Bessig] Preserving open government data: Data Rescue
>> event this weekend at University of Colorado
>>    >
>>    >     As experts with data, we know that having access to data is the
>> first requirement. But we face a risk that critical
>>    >     environmental data may disappear from the public domain, for
>> political reasons.
>>    >
>>    >     The folks at Data Refuge (http://www.ppehlab.org/datarefuge)
>> are organizing a hackathon event this weekend Feb 18-19 in which
>>    >     volunteers will be trained to search for federal data that
>> hasn't been preserved yet and help do so, partnering with
>>    >     repositories at places like the Internet Archive, datarefuge.org,
>> and a consortium of major research libraries.
>>    >
>>    >     Who knows what important and interesting data you may run across?
>>    >
>>    >     Sign up for the event, to be held at CU Boulder's beautiful Law
>> Library, at
>>    >
>>    >       https://www.eventbrite.com/e/data-rescue-boulder-tickets-31
>> 995427184
>>    >
>>    >     Learn more at
>>    >
>>    >      https://www.facebook.com/dataRescueBoulder/
>>    >
>>    >     End of Term Presidential Harvest 2016
>>    >      http://digital2.library.unt.edu/nomination/eth2016/about/
>>    >
>>    >     This is not the first time that there has been an End of Term
>> Web harvest.  See previous ones at:
>>    >
>>    >      http://eotarchive.cdlib.org/
>>    >
>>    >     Cheers,
>>    >
>>    >     Neal McBurnett                 http://neal.mcburnett.org/
>>    >     _______________________________________________
>>    >     Bessig mailing list
>>    >     Bessig at lists.esipfed.org
>>    >     http://lists.deltaforce.net/mailman/listinfo/bessig
>>    >
>>    >
>>    >     _______________________________________________
>>    >     Bessig mailing list
>>    >     Bessig at lists.esipfed.org
>>    >     http://lists.deltaforce.net/mailman/listinfo/bessig
>>    >
>>    >
>>    >
>>    >
>>    >
>>    > --
>>    > Dan Ziskin, PhD
>>    > NCAR - Atmospheric Chemistry Observations & Modeling Laboratory
>>    > MOPITT Data Manager
>>    > 303-497-2913
>>    >
>>    >
>>
>>    > _______________________________________________
>>    > Bessig mailing list
>>    > Bessig at lists.esipfed.org
>>    > http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>>    _______________________________________________
>>    Bessig mailing list
>>    Bessig at lists.esipfed.org
>>    http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>>
>> _______________________________________________
>> Bessig mailing list
>> Bessig at lists.esipfed.org
>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>
> _______________________________________________
> Bessig mailing list
> Bessig at lists.esipfed.org
> http://lists.deltaforce.net/mailman/listinfo/bessig
>
>


-- 
Grace Peng, PhD
Atmospheric & Geoscience Research Data Archive
Computational & Information Systems Laboratory
National Center for Atmospheric Research
303-497-1218
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deltaforce.net/pipermail/bessig/attachments/20170215/318693f1/attachment-0001.html>


More information about the Bessig mailing list