[Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
Grace Peng
grace at ucar.edu
Wed Feb 15 19:27:56 EST 2017
Folks,
You brought up great points.
I'm also worried that, if a lot of volunteers try to
simultaneously access and 'save' our data,
it will be indistinguishable from a DoS attack.
rda.ucar.edu is already stretched to serve all our users
at peak times.
Grace
On Wed, Feb 15, 2017 at 1:22 PM, Julia Collins via Bessig <
bessig at lists.esipfed.org> wrote:
> My delay in responding has paid off, since Lynn and Cathy have captured the
> points that I would have attempted to make. :-) I will not be able to
> make the February BESSIG happy hour, but will offer a few observations
> that can perhaps serve as additional happy hour food for thought.
>
> A Lynn notes, there is already a community of data professionals who are
> trying to curate and preserve data. Data that are currently managed in
> repositories are backed up and will not disappear overnight. Data that
> are *not* managed won't be findable in the context of a hackathon --
> some of us already spend a lot of time trying to figure out how to catch
> datasets before they fall through the cracks. The silver lining of all
> of the recent administrative upheaval is that now a lot of people are
> thinking about data access -- and the value of data -- in ways they
> hadn't before. This gives us a chance to increase the visibility of
> data management issues that have long existed. For example, data
> management funding resources have never matched the needs. A better use
> of Hackathon brain power would be identify strategies for harnessing the
> current fears regarding loss of data access into constructive steps for
> increasing the long-term funding for data management. Also, as Cathy
> alluded to, we are in real danger of losing some of our observational
> systems as satellites age without plans to replace them. It's easy to
> get swept up in the dramatic headlines, but those are not the
> fundamental problems that we need to address to ensure long-term data
> storage and access.
>
> Julia
> --
> Julia Collins
>
>
> On Wed, 15 Feb 2017, Yarmey, Lynn Rees via Bessig wrote:
>
> Hi Neal and all,
>>
>> This thread is so timely as the topic (literally) kept me up last
>> night… I am looking forward to debating this with you all tomorrow at
>> the BESSIG Happy Hour in Golden! :)
>>
>> FWIW, while I completely believe in the good intentions and talent of
>> the gathered hackathon folks, the PPEH descriptions call for ‘hackers’
>> to ‘figure out how to capture the uncrawlable data’ (
>> http://www.ppehlab.org/datarefugepaths ), with no further definition.
>> I would note that in addition to usage logging (for instance the NASA
>> EarthData login) to demonstrate data center impact, additional
>> crawling and access restrictions include privacy protections on social
>> science and individually-identifiable related data, ownership
>> considerations in indigenous knowledge, as well as legal protections
>> on endangered species locations, etc. I know some data is
>> crawler-limited because it was deprecated in favor of a new version. I
>> would feel more confident if the website descriptions were more
>> nuanced and clear, though I am very happy that data-aware people will
>> be in the Boulder event for these reasons!
>>
>> More broadly - While I initially was really excited that data rescue
>> was finally (finally!) getting some huge visibility and props, I am
>> now more hesitant. The direction of the Data Refuge hackathons (again,
>> well intentioned and valuable in raising awareness of data and
>> passion!) seems to be glossing over the expertise and insight of the
>> professional data community. The curators of data being ‘rescued’ are
>> by and large not being contacted/credited, and it is not clear that
>> the good data practices that have been refined over years are being
>> acknowledged much less accurately implemented. For example, PPEHLab
>> website talks about metadata creation rather than capture which makes
>> me nervous. Also, and IMO, the Internet Archive does not have a
>> strong understanding of the requirements for data handling,
>> preservation, storage, access, or discovery. If all .gov content
>> somehow disappeared tomorrow (unlikely as noted), would folks be able
>> to discover, access, understand, and use Data Refuge-saved data?
>>
>> Moreover, in times of increased govt scrutiny of funding, if a bunch
>> of volunteers get together and ‘save all of NASA’s data’ (for example)
>> in a weekend does this support the critical funding of long-term data
>> centers and data curation experts?
>>
>> All of that said, I may well just be paranoid! All the better to
>> discuss over a drink tomorrow :)
>> Lynn
>>
>>
>>
>> On 2/15/17, 11:33 AM, "Bessig on behalf of Neal McBurnett via Bessig" <
>> bessig-bounces at lists.esipfed.org on behalf of bessig at lists.esipfed.org>
>> wrote:
>>
>> Thanks for the great input, folks. I'm sorry for not being a bit more
>> clear.
>>
>> Cathy, I think we just may have different reactions to the word
>> "hackathon". The original application of the word "hack" to technology and
>> computers was at MIT and was all about creativity and not at all about
>> malicious activity, or evading proper safeguards. See:
>>
>> A Short History of “Hack” - The New Yorker
>> http://www.newyorker.com/tech/elements/a-short-history-of-hack
>>
>> A "hackathon" is a more recent term for getting folks together to do
>> good things creatively, e.g. add information to wikipedia articles, etc.
>>
>> As always, there are some folks that comandeered the term to apply it
>> to other activities, and the media played a big role. But that's not us.
>>
>> So don't worry about us on Saturday. As Daniel notes, we're focused
>> on data that has been released openly by the government, but might become
>> harder to find or even be thrown away. The more insight we can get from
>> the sorts of experts here on the best way to track and authenticate
>> well-vetted data so we can all get reproducable results, support them with
>> data and documentation and procedures, etc, the better.
>>
>> Cheers,
>>
>> Neal McBurnett http://neal.mcburnett.org/
>>
>> On Wed, Feb 15, 2017 at 06:06:03PM +0000, Anne Wilson via Bessig wrote:
>> > Hi Cathy, I totally take your point about the poor data
>> management, And, I can see the hackathon heading towards a solution of
>> > counting the availability of datasets, which I think is good to
>> know. I see them as separate issues.
>> >
>> > Also, the hackathon did raise the awareness of issue that access to
>> datasets can slowly erode over time. I’ve seen that happen.
>> > I think data about access sounds pretty important.
>> >
>> > Anne
>> >
>> > From: Bessig on behalf of Anne
>> > Reply-To: Cathy
>> > Date: Wednesday, February 15, 2017 at 11:01 AM
>> > To: Daniel Ziskin
>> > Cc: Anne
>> > Subject: Re: [Bessig] Fwd: Preserving open government data: Data
>> Rescue event this weekend at University of Colorado
>> >
>> >
>> > I understand the desire to save data but why have a hackathon?
>> Government orgs like the one I'm in can direct people to how best to
>> > get the data and save it along with documentation. I have not seen
>> any email to our group from these efforts requesting how to get
>> > the data, how it is documented or if we have more data we could make
>> public that isn't now. Data without any context is much less
>> > useful.
>> >
>> > Webpages are a different thing and saving them makes some sense.
>> Though if no one is releasing new data about animal treatment, how
>> > will people be able to get that with any hackathon of old data.
>> >
>> > Cathy
>> >
>> >
>> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> > From: "Daniel Ziskin" <ziskin at ucar.edu>
>> > To: "Cathy" <besthiker at comcast.net>
>> > Cc: bessig at lists.esipfed.org
>> > Sent: Wednesday, February 15, 2017 10:54:45 AM
>> > Subject: Re: [Bessig] Fwd: Preserving open government data: Data
>> Rescue event this weekend at University of Colorado
>> >
>> > I think this is more of a political response to the possibility that
>> public access to government data will be rescinded because it
>> > doesn't support a particular ideology. To that end, I endorse the
>> effort.
>> >
>> > The USDA has already shut down access to its records of animal abuse
>> by agribusiness[1] and the Dept of Education has already shut
>> > down a website that explains protections for disabled students[2].
>> What's next?
>> >
>> > 1. http://news.nationalgeographic.com/2017/02/wildlife-watch-
>> usda-animal-welfare-trump-records/
>> > 2. http://www.huffingtonpost.com/entry/devos-disabilities-web-s
>> ite_us_58a0fd7ae4b094a129ec35b8
>> >
>> > On Wed, Feb 15, 2017 at 9:46 AM, Cathy via Bessig <
>> bessig at lists.esipfed.org> wrote:
>> >
>> > Not to be too annoying, but why are they having a hackathon to
>> get the data? They can just ask for it. Data that has not been
>> > preserved is likely behind firewalls and on local storage
>> devices. Accessing them in a government system is illegal. Also, even
>> > if you were to get it, you would probably not be able to make
>> sense of it. If someone asked for help getting data, we would
>> > help them in our group. And for free!
>> >
>> > NCAR already archives an enormous amount of data. It is doubtful
>> that is going anywhere. And it is well documented. Other
>> > institutions around the world already archive public data from
>> the US.
>> >
>> > The real issues with data include data that has not been fully
>> processed and made available. This includes data that is on
>> > tape, paper, or cd, etc. Or data that was recorded but not
>> processed and documented. Some of that is worth saving and may be at
>> > risk with less funding but a hackathon won't help. And, as a
>> scientist, data streams that are stopped (observing systems) are
>> > much more of an issue than saving already public data. This is a
>> particular issue for climate where long, consistent records
>> > are needed. The various hackathons out there make it sound like
>> data will be fine but it won't be.
>> >
>> > Cathy
>> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>> > From: "Neal McBurnett via Bessig" <bessig at lists.esipfed.org>
>> > To: bessig at lists.esipfed.org
>> > Sent: Wednesday, February 15, 2017 9:17:08 AM
>> > Subject: [Bessig] Preserving open government data: Data Rescue
>> event this weekend at University of Colorado
>> >
>> > As experts with data, we know that having access to data is the
>> first requirement. But we face a risk that critical
>> > environmental data may disappear from the public domain, for
>> political reasons.
>> >
>> > The folks at Data Refuge (http://www.ppehlab.org/datarefuge)
>> are organizing a hackathon event this weekend Feb 18-19 in which
>> > volunteers will be trained to search for federal data that
>> hasn't been preserved yet and help do so, partnering with
>> > repositories at places like the Internet Archive, datarefuge.org,
>> and a consortium of major research libraries.
>> >
>> > Who knows what important and interesting data you may run across?
>> >
>> > Sign up for the event, to be held at CU Boulder's beautiful Law
>> Library, at
>> >
>> > https://www.eventbrite.com/e/data-rescue-boulder-tickets-31
>> 995427184
>> >
>> > Learn more at
>> >
>> > https://www.facebook.com/dataRescueBoulder/
>> >
>> > End of Term Presidential Harvest 2016
>> > http://digital2.library.unt.edu/nomination/eth2016/about/
>> >
>> > This is not the first time that there has been an End of Term
>> Web harvest. See previous ones at:
>> >
>> > http://eotarchive.cdlib.org/
>> >
>> > Cheers,
>> >
>> > Neal McBurnett http://neal.mcburnett.org/
>> > _______________________________________________
>> > Bessig mailing list
>> > Bessig at lists.esipfed.org
>> > http://lists.deltaforce.net/mailman/listinfo/bessig
>> >
>> >
>> > _______________________________________________
>> > Bessig mailing list
>> > Bessig at lists.esipfed.org
>> > http://lists.deltaforce.net/mailman/listinfo/bessig
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Dan Ziskin, PhD
>> > NCAR - Atmospheric Chemistry Observations & Modeling Laboratory
>> > MOPITT Data Manager
>> > 303-497-2913
>> >
>> >
>>
>> > _______________________________________________
>> > Bessig mailing list
>> > Bessig at lists.esipfed.org
>> > http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>> _______________________________________________
>> Bessig mailing list
>> Bessig at lists.esipfed.org
>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>>
>> _______________________________________________
>> Bessig mailing list
>> Bessig at lists.esipfed.org
>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>
> _______________________________________________
> Bessig mailing list
> Bessig at lists.esipfed.org
> http://lists.deltaforce.net/mailman/listinfo/bessig
>
>
--
Grace Peng, PhD
Atmospheric & Geoscience Research Data Archive
Computational & Information Systems Laboratory
National Center for Atmospheric Research
303-497-1218
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deltaforce.net/pipermail/bessig/attachments/20170215/318693f1/attachment-0001.html>
More information about the Bessig
mailing list