[Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
Yarmey, Lynn Rees
yarmel at rpi.edu
Wed Feb 15 14:39:49 EST 2017
Hi Neal and all,
This thread is so timely as the topic (literally) kept me up last night… I am looking forward to debating this with you all tomorrow at the BESSIG Happy Hour in Golden! :)
FWIW, while I completely believe in the good intentions and talent of the gathered hackathon folks, the PPEH descriptions call for ‘hackers’ to ‘figure out how to capture the uncrawlable data’ ( http://www.ppehlab.org/datarefugepaths ), with no further definition. I would note that in addition to usage logging (for instance the NASA EarthData login) to demonstrate data center impact, additional crawling and access restrictions include privacy protections on social science and individually-identifiable related data, ownership considerations in indigenous knowledge, as well as legal protections on endangered species locations, etc. I know some data is crawler-limited because it was deprecated in favor of a new version. I would feel more confident if the website descriptions were more nuanced and clear, though I am very happy that data-aware people will be in the Boulder event for these reasons!
More broadly - While I initially was really excited that data rescue was finally (finally!) getting some huge visibility and props, I am now more hesitant. The direction of the Data Refuge hackathons (again, well intentioned and valuable in raising awareness of data and passion!) seems to be glossing over the expertise and insight of the professional data community. The curators of data being ‘rescued’ are by and large not being contacted/credited, and it is not clear that the good data practices that have been refined over years are being acknowledged much less accurately implemented. For example, PPEHLab website talks about metadata creation rather than capture which makes me nervous. Also, and IMO, the Internet Archive does not have a strong understanding of the requirements for data handling, preservation, storage, access, or discovery. If all .gov content somehow disappeared tomorrow (unlikely as noted), would folks be able to discover, access, understand, and use Data Refuge-saved data?
Moreover, in times of increased govt scrutiny of funding, if a bunch of volunteers get together and ‘save all of NASA’s data’ (for example) in a weekend does this support the critical funding of long-term data centers and data curation experts?
All of that said, I may well just be paranoid! All the better to discuss over a drink tomorrow :)
Lynn
On 2/15/17, 11:33 AM, "Bessig on behalf of Neal McBurnett via Bessig" <bessig-bounces at lists.esipfed.org on behalf of bessig at lists.esipfed.org> wrote:
Thanks for the great input, folks. I'm sorry for not being a bit more clear.
Cathy, I think we just may have different reactions to the word "hackathon". The original application of the word "hack" to technology and computers was at MIT and was all about creativity and not at all about malicious activity, or evading proper safeguards. See:
A Short History of “Hack” - The New Yorker
http://www.newyorker.com/tech/elements/a-short-history-of-hack
A "hackathon" is a more recent term for getting folks together to do good things creatively, e.g. add information to wikipedia articles, etc.
As always, there are some folks that comandeered the term to apply it to other activities, and the media played a big role. But that's not us.
So don't worry about us on Saturday. As Daniel notes, we're focused on data that has been released openly by the government, but might become harder to find or even be thrown away. The more insight we can get from the sorts of experts here on the best way to track and authenticate well-vetted data so we can all get reproducable results, support them with data and documentation and procedures, etc, the better.
Cheers,
Neal McBurnett http://neal.mcburnett.org/
On Wed, Feb 15, 2017 at 06:06:03PM +0000, Anne Wilson via Bessig wrote:
> Hi Cathy, I totally take your point about the poor data management, And, I can see the hackathon heading towards a solution of
> counting the availability of datasets, which I think is good to know. I see them as separate issues.
>
> Also, the hackathon did raise the awareness of issue that access to datasets can slowly erode over time. I’ve seen that happen.
> I think data about access sounds pretty important.
>
> Anne
>
> From: Bessig on behalf of Anne
> Reply-To: Cathy
> Date: Wednesday, February 15, 2017 at 11:01 AM
> To: Daniel Ziskin
> Cc: Anne
> Subject: Re: [Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
>
>
> I understand the desire to save data but why have a hackathon? Government orgs like the one I'm in can direct people to how best to
> get the data and save it along with documentation. I have not seen any email to our group from these efforts requesting how to get
> the data, how it is documented or if we have more data we could make public that isn't now. Data without any context is much less
> useful.
>
> Webpages are a different thing and saving them makes some sense. Though if no one is releasing new data about animal treatment, how
> will people be able to get that with any hackathon of old data.
>
> Cathy
>
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: "Daniel Ziskin" <ziskin at ucar.edu>
> To: "Cathy" <besthiker at comcast.net>
> Cc: bessig at lists.esipfed.org
> Sent: Wednesday, February 15, 2017 10:54:45 AM
> Subject: Re: [Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
>
> I think this is more of a political response to the possibility that public access to government data will be rescinded because it
> doesn't support a particular ideology. To that end, I endorse the effort.
>
> The USDA has already shut down access to its records of animal abuse by agribusiness[1] and the Dept of Education has already shut
> down a website that explains protections for disabled students[2]. What's next?
>
> 1. http://news.nationalgeographic.com/2017/02/wildlife-watch-usda-animal-welfare-trump-records/
> 2. http://www.huffingtonpost.com/entry/devos-disabilities-web-site_us_58a0fd7ae4b094a129ec35b8
>
> On Wed, Feb 15, 2017 at 9:46 AM, Cathy via Bessig <bessig at lists.esipfed.org> wrote:
>
> Not to be too annoying, but why are they having a hackathon to get the data? They can just ask for it. Data that has not been
> preserved is likely behind firewalls and on local storage devices. Accessing them in a government system is illegal. Also, even
> if you were to get it, you would probably not be able to make sense of it. If someone asked for help getting data, we would
> help them in our group. And for free!
>
> NCAR already archives an enormous amount of data. It is doubtful that is going anywhere. And it is well documented. Other
> institutions around the world already archive public data from the US.
>
> The real issues with data include data that has not been fully processed and made available. This includes data that is on
> tape, paper, or cd, etc. Or data that was recorded but not processed and documented. Some of that is worth saving and may be at
> risk with less funding but a hackathon won't help. And, as a scientist, data streams that are stopped (observing systems) are
> much more of an issue than saving already public data. This is a particular issue for climate where long, consistent records
> are needed. The various hackathons out there make it sound like data will be fine but it won't be.
>
> Cathy
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: "Neal McBurnett via Bessig" <bessig at lists.esipfed.org>
> To: bessig at lists.esipfed.org
> Sent: Wednesday, February 15, 2017 9:17:08 AM
> Subject: [Bessig] Preserving open government data: Data Rescue event this weekend at University of Colorado
>
> As experts with data, we know that having access to data is the first requirement. But we face a risk that critical
> environmental data may disappear from the public domain, for political reasons.
>
> The folks at Data Refuge (http://www.ppehlab.org/datarefuge) are organizing a hackathon event this weekend Feb 18-19 in which
> volunteers will be trained to search for federal data that hasn't been preserved yet and help do so, partnering with
> repositories at places like the Internet Archive, datarefuge.org, and a consortium of major research libraries.
>
> Who knows what important and interesting data you may run across?
>
> Sign up for the event, to be held at CU Boulder's beautiful Law Library, at
>
> https://www.eventbrite.com/e/data-rescue-boulder-tickets-31995427184
>
> Learn more at
>
> https://www.facebook.com/dataRescueBoulder/
>
> End of Term Presidential Harvest 2016
> http://digital2.library.unt.edu/nomination/eth2016/about/
>
> This is not the first time that there has been an End of Term Web harvest. See previous ones at:
>
> http://eotarchive.cdlib.org/
>
> Cheers,
>
> Neal McBurnett http://neal.mcburnett.org/
> _______________________________________________
> Bessig mailing list
> Bessig at lists.esipfed.org
> http://lists.deltaforce.net/mailman/listinfo/bessig
>
>
> _______________________________________________
> Bessig mailing list
> Bessig at lists.esipfed.org
> http://lists.deltaforce.net/mailman/listinfo/bessig
>
>
>
>
>
> --
> Dan Ziskin, PhD
> NCAR - Atmospheric Chemistry Observations & Modeling Laboratory
> MOPITT Data Manager
> 303-497-2913
>
>
> _______________________________________________
> Bessig mailing list
> Bessig at lists.esipfed.org
> http://lists.deltaforce.net/mailman/listinfo/bessig
_______________________________________________
Bessig mailing list
Bessig at lists.esipfed.org
http://lists.deltaforce.net/mailman/listinfo/bessig
More information about the Bessig
mailing list