[Bessig] Preserving open government data: Data Rescue event this weekend at University of Colorado
Ruth Duerr
ruth.duerr3 at gmail.com
Thu Feb 16 17:28:19 EST 2017
OK, folks - It looks like Joan is seriously going to try and make it. Soo… if you were on the fence about attending or think you have anything to say to the DataRescue folks maybe you’ll decide that you should be there!
Ruth
> On Feb 16, 2017, at 1:20 PM, Neal McBurnett <neal at bcn.boulder.co.us> wrote:
>
> Fabulous - thank you Ruth for finding Joan and offering such direct helpful expertise!
> I'm sorry I won't make it this afternoon, but I hope it is productive.
>
> Neal McBurnett http://neal.mcburnett.org/
>
> On Thu, Feb 16, 2017 at 01:07:37PM -0700, Ruth Duerr via Bessig wrote:
>> Hi Folks,
>>
>> I am inviting Joan Saez to the BESSIG get together this evening. She is coordinating the Data Rescue event this weekend. We haven’t had a chance to talk much yet; but I think this would be a good opportunity to get the official data manager voice into the process. I do note that she was concerned that she hadn’t been able to get folks from NCAR and NOAA to pay much attention to the event; but maybe she just doesn’t know who to talk to. Unfortunately I don’t know yet if she will be able to make it.
>>
>> So, I do know that there is data that needs saving! But quite possibly not quite for the reasons that Joan is thinking of. For example, I still have that list of 100 data sets from NSSDC that need rescuing. I think my students succeeded in rescuing like 3 of them so far. And let’s face it, NSIDC has even won 2 data rescue awards in the last few years for rescuing old NASA data. Yes, in both cases the bits were still available, but either on old media (no one has the hardware any more) or old unreadable formats. And Lynn and Julia have very good points about there being lots of data that dies because the person dies or leaves and the data was in their desk drawer and data that doesn’t yet have enough metadata to even know what it is, and on and on and on…. Yes, like it or not we were already in the digital dark ages.
>>
>> However, it is also true that some data sets have gone missing from some government websites, most notably the information about where animal research, etc. is occurring and while it sounds like the Trump administration is chilling out a bit about dismantling the EPA website at the moment, there is still a bill going through Congress to shut it down completely in 2018, so who knows… Better to be prepared than not I think! But then I keep thinking of all that International Polar Year 2 data from the 1930’s that went missing due to WWII. I’ve always thought that backups on each continent was a very good idea!
>>
>> So from that standpoint, I do think that it is great to have a group that monitors government sites (well any site with data) and notices changes (some of you may recall that this sort of monitoring - monitoring links for web services, data descriptions, and stuff of that nature - was absolutely needed to move my “Google for data” stuff forward in any case). I also think that this would be an appropriate time to insert the data management community perspective into these events…
>>
>> So I guess we will see what we will see!
>>
>> Ruth
>>
>>
>>> On Feb 15, 2017, at 7:09 PM, Neal McBurnett via Bessig <bessig at lists.esipfed.org> wrote:
>>>
>>> Thanks, Lynn. You make lots of good points there!
>>>
>>> I haven't looked at this a whole lot, I must confess. I was hoping, based on the institutions named, that there are some experts in scientific data curation working with the Data Rescue folks. But given the scarcity of that resource (scientific data curation experts) I probably should take a more cautious stance. I'm glad to be able to discuss this with one of the best collections of such experts that I'm aware of, and hope this conversation is useful.
>>>
>>> I know you all are passionate about good data, good metadata, and good curation, and I see that now a lot of people are getting more passionate about that, but with far less expertise in those specialized fields.
>>>
>>> Since we certainly don't want passionate activists to cause problems for the scientists, I guess I'd hope that you all can point to some good online resources to expand on the problems you cite. Ideally you could try to tailor them for this new audience, so we maximize the good work that can be done, while minimizing any disruptions.
>>>
>>> Of course I know that's asking a lot....
>>>
>>> And it's got me thinking in new ways about provenance, digital signatures, etc.
>>>
>>> Julia, thank you also for your comments. I do hope that we get attention to the need to address the generic, sometimes "boring", problems of long term support/funding/etc for both curation and for the science that creates it.
>>>
>>> Evidently, there is a problem with web sites, like some at the white house, losing lots of info and context. The usual techies do know how to deal with that stuff. Distinguishing the real problems from the imagined ones is of course very helpful.
>>>
>>> You say not to worry about data already in repositories, and I certainly hope that's true. Can you point to the relevant policies or whatever that will help the activists understand your perspective (despite their their fear of high-level attempts to delete such data) and save everybody time?
>>>
>>> Re: the other unmanaged data, that you're concerned about, can you provide any pointers to help volunteers know what sorts of data you're talking about, and might in principle at least expect some help with?
>>>
>>> For more context:
>>>
>>> I heard about a recent event organized a week ago by the Environmental Data and Governance Initiative (EDGI) in New York that fill up (150 people signed up) in under 24 hours.
>>>
>>> Scientists are scrambling to save gov research in 'Data Rescue' events - Business Insider
>>> http://www.businessinsider.com/data-rescue-government-data-preservation-efforts-2017-2
>>>
>>> See also the organization behind that one:
>>> ENVIRONMENTAL DATA & GOVERNANCE INITIATIVE
>>> https://envirodatagov.org/
>>>
>>> and this story:
>>> Rogue Scientists Race to Save Climate Data from Trump | WIRED
>>> https://www.wired.com/2017/01/rogue-scientists-race-save-climate-data-trump/
>>>
>>> Trump officials suspend plan to delete EPA climate web pages | Science | AAAS
>>> http://www.sciencemag.org/news/2017/01/trump-officials-suspend-plan-delete-epa-climate-web-page
>>>
>>> And on the other hand, I ran across this intriguing abstract about how the curators are working:
>>>
>>> Curation of Scientific Data at Risk of Loss: Data Rescue and Dissemination - Academic Commons
>>> https://academiccommons.columbia.edu/catalog/ac:206975
>>>
>>> a recent effort by the NASA Socioeconomic Data and Applications Center (SEDAC) to rescue the Millennium Ecosystem Assessment (MA) collection of scientific data as a case study on the issues raised by a data rescue effort from an existing archive that had not fully curated the original data. The MA was an international survey of the worlds ecosystems conducted by the scientific community in 20012005 involving more than 1,300 experts from around the world. As part of the MA, a diverse set of environmental and socioeconomic data was assembled and integrated in order to enable scientific analysis and assessment in support of policy and decision making. This data was held by the US Geological Survey (USGS) National Biological Information Infrastructure (NBII), which was terminated by the US government in early 2012.
>>>
>>> Many thanks to all of you!
>>>
>>> Neal McBurnett http://neal.mcburnett.org/
>>>
>>> On Wed, Feb 15, 2017 at 01:22:15PM -0700, Julia Collins via Bessig wrote:
>>>> My delay in responding has paid off, since Lynn and Cathy have captured the
>>>> points that I would have attempted to make. :-) I will not be able to
>>>> make the February BESSIG happy hour, but will offer a few observations
>>>> that can perhaps serve as additional happy hour food for thought.
>>>>
>>>> A Lynn notes, there is already a community of data professionals who are
>>>> trying to curate and preserve data. Data that are currently managed in
>>>> repositories are backed up and will not disappear overnight. Data that
>>>> are *not* managed won't be findable in the context of a hackathon --
>>>> some of us already spend a lot of time trying to figure out how to catch
>>>> datasets before they fall through the cracks. The silver lining of all
>>>> of the recent administrative upheaval is that now a lot of people are
>>>> thinking about data access -- and the value of data -- in ways they
>>>> hadn't before. This gives us a chance to increase the visibility of
>>>> data management issues that have long existed. For example, data
>>>> management funding resources have never matched the needs. A better use
>>>> of Hackathon brain power would be identify strategies for harnessing the
>>>> current fears regarding loss of data access into constructive steps for
>>>> increasing the long-term funding for data management. Also, as Cathy
>>>> alluded to, we are in real danger of losing some of our observational
>>>> systems as satellites age without plans to replace them. It's easy to
>>>> get swept up in the dramatic headlines, but those are not the
>>>> fundamental problems that we need to address to ensure long-term data
>>>> storage and access.
>>>
>>>
>>> On Wed, Feb 15, 2017 at 07:39:49PM +0000, Yarmey, Lynn Rees via Bessig wrote:
>>>> Hi Neal and all,
>>>>
>>>> This thread is so timely as the topic (literally) kept me up last night… I am looking forward to debating this with you all tomorrow at the BESSIG Happy Hour in Golden! :)
>>>>
>>>> FWIW, while I completely believe in the good intentions and talent of the gathered hackathon folks, the PPEH descriptions call for ‘hackers’ to ‘figure out how to capture the uncrawlable data’ ( http://www.ppehlab.org/datarefugepaths ), with no further definition. I would note that in addition to usage logging (for instance the NASA EarthData login) to demonstrate data center impact, additional crawling and access restrictions include privacy protections on social science and individually-identifiable related data, ownership considerations in indigenous knowledge, as well as legal protections on endangered species locations, etc. I know some data is crawler-limited because it was deprecated in favor of a new version. I would feel more confident if the website descriptions were more nuanced and clear, though I am very happy that data-aware people will be in the Boulder event for these reasons!
>>>>
>>>> More broadly - While I initially was really excited that data rescue was finally (finally!) getting some huge visibility and props, I am now more hesitant. The direction of the Data Refuge hackathons (again, well intentioned and valuable in raising awareness of data and passion!) seems to be glossing over the expertise and insight of the professional data community. The curators of data being ‘rescued’ are by and large not being contacted/credited, and it is not clear that the good data practices that have been refined over years are being acknowledged much less accurately implemented. For example, PPEHLab website talks about metadata creation rather than capture which makes me nervous. Also, and IMO, the Internet Archive does not have a strong understanding of the requirements for data handling, preservation, storage, access, or discovery. If all .gov content somehow disappeared tomorrow (unlikely as noted), would folks be able to discover, access, understand, and use Data Refuge-saved data?
>>>>
>>>> Moreover, in times of increased govt scrutiny of funding, if a bunch of volunteers get together and ‘save all of NASA’s data’ (for example) in a weekend does this support the critical funding of long-term data centers and data curation experts?
>>>>
>>>> All of that said, I may well just be paranoid! All the better to discuss over a drink tomorrow :)
>>>> Lynn
>>>>
>>>>
>>>>
>>>> On 2/15/17, 11:33 AM, "Bessig on behalf of Neal McBurnett via Bessig" <bessig-bounces at lists.esipfed.org on behalf of bessig at lists.esipfed.org> wrote:
>>>>
>>>> Thanks for the great input, folks. I'm sorry for not being a bit more clear.
>>>>
>>>> Cathy, I think we just may have different reactions to the word "hackathon". The original application of the word "hack" to technology and computers was at MIT and was all about creativity and not at all about malicious activity, or evading proper safeguards. See:
>>>>
>>>> A Short History of “Hack” - The New Yorker
>>>> http://www.newyorker.com/tech/elements/a-short-history-of-hack
>>>>
>>>> A "hackathon" is a more recent term for getting folks together to do good things creatively, e.g. add information to wikipedia articles, etc.
>>>>
>>>> As always, there are some folks that comandeered the term to apply it to other activities, and the media played a big role. But that's not us.
>>>>
>>>> So don't worry about us on Saturday. As Daniel notes, we're focused on data that has been released openly by the government, but might become harder to find or even be thrown away. The more insight we can get from the sorts of experts here on the best way to track and authenticate well-vetted data so we can all get reproducable results, support them with data and documentation and procedures, etc, the better.
>>>>
>>>> Cheers,
>>>>
>>>> Neal McBurnett http://neal.mcburnett.org/
>>>>
>>>> On Wed, Feb 15, 2017 at 06:06:03PM +0000, Anne Wilson via Bessig wrote:
>>>>> Hi Cathy, I totally take your point about the poor data management, And, I can see the hackathon heading towards a solution of
>>>>> counting the availability of datasets, which I think is good to know. I see them as separate issues.
>>>>>
>>>>> Also, the hackathon did raise the awareness of issue that access to datasets can slowly erode over time. I’ve seen that happen.
>>>>> I think data about access sounds pretty important.
>>>>>
>>>>> Anne
>>>>>
>>>>> From: Bessig on behalf of Anne
>>>>> Reply-To: Cathy
>>>>> Date: Wednesday, February 15, 2017 at 11:01 AM
>>>>> To: Daniel Ziskin
>>>>> Cc: Anne
>>>>> Subject: Re: [Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
>>>>>
>>>>>
>>>>> I understand the desire to save data but why have a hackathon? Government orgs like the one I'm in can direct people to how best to
>>>>> get the data and save it along with documentation. I have not seen any email to our group from these efforts requesting how to get
>>>>> the data, how it is documented or if we have more data we could make public that isn't now. Data without any context is much less
>>>>> useful.
>>>>>
>>>>> Webpages are a different thing and saving them makes some sense. Though if no one is releasing new data about animal treatment, how
>>>>> will people be able to get that with any hackathon of old data.
>>>>>
>>>>> Cathy
>>>>>
>>>>>
>>>>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>>>> From: "Daniel Ziskin" <ziskin at ucar.edu>
>>>>> To: "Cathy" <besthiker at comcast.net>
>>>>> Cc: bessig at lists.esipfed.org
>>>>> Sent: Wednesday, February 15, 2017 10:54:45 AM
>>>>> Subject: Re: [Bessig] Fwd: Preserving open government data: Data Rescue event this weekend at University of Colorado
>>>>>
>>>>> I think this is more of a political response to the possibility that public access to government data will be rescinded because it
>>>>> doesn't support a particular ideology. To that end, I endorse the effort.
>>>>>
>>>>> The USDA has already shut down access to its records of animal abuse by agribusiness[1] and the Dept of Education has already shut
>>>>> down a website that explains protections for disabled students[2]. What's next?
>>>>>
>>>>> 1. http://news.nationalgeographic.com/2017/02/wildlife-watch-usda-animal-welfare-trump-records/
>>>>> 2. http://www.huffingtonpost.com/entry/devos-disabilities-web-site_us_58a0fd7ae4b094a129ec35b8
>>>>>
>>>>> On Wed, Feb 15, 2017 at 9:46 AM, Cathy via Bessig <bessig at lists.esipfed.org> wrote:
>>>>>
>>>>> Not to be too annoying, but why are they having a hackathon to get the data? They can just ask for it. Data that has not been
>>>>> preserved is likely behind firewalls and on local storage devices. Accessing them in a government system is illegal. Also, even
>>>>> if you were to get it, you would probably not be able to make sense of it. If someone asked for help getting data, we would
>>>>> help them in our group. And for free!
>>>>>
>>>>> NCAR already archives an enormous amount of data. It is doubtful that is going anywhere. And it is well documented. Other
>>>>> institutions around the world already archive public data from the US.
>>>>>
>>>>> The real issues with data include data that has not been fully processed and made available. This includes data that is on
>>>>> tape, paper, or cd, etc. Or data that was recorded but not processed and documented. Some of that is worth saving and may be at
>>>>> risk with less funding but a hackathon won't help. And, as a scientist, data streams that are stopped (observing systems) are
>>>>> much more of an issue than saving already public data. This is a particular issue for climate where long, consistent records
>>>>> are needed. The various hackathons out there make it sound like data will be fine but it won't be.
>>>>>
>>>>> Cathy
>>>>> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>>>>> From: "Neal McBurnett via Bessig" <bessig at lists.esipfed.org>
>>>>> To: bessig at lists.esipfed.org
>>>>> Sent: Wednesday, February 15, 2017 9:17:08 AM
>>>>> Subject: [Bessig] Preserving open government data: Data Rescue event this weekend at University of Colorado
>>>>>
>>>>> As experts with data, we know that having access to data is the first requirement. But we face a risk that critical
>>>>> environmental data may disappear from the public domain, for political reasons.
>>>>>
>>>>> The folks at Data Refuge (http://www.ppehlab.org/datarefuge) are organizing a hackathon event this weekend Feb 18-19 in which
>>>>> volunteers will be trained to search for federal data that hasn't been preserved yet and help do so, partnering with
>>>>> repositories at places like the Internet Archive, datarefuge.org, and a consortium of major research libraries.
>>>>>
>>>>> Who knows what important and interesting data you may run across?
>>>>>
>>>>> Sign up for the event, to be held at CU Boulder's beautiful Law Library, at
>>>>>
>>>>> https://www.eventbrite.com/e/data-rescue-boulder-tickets-31995427184
>>>>>
>>>>> Learn more at
>>>>>
>>>>> https://www.facebook.com/dataRescueBoulder/
>>>>>
>>>>> End of Term Presidential Harvest 2016
>>>>> http://digital2.library.unt.edu/nomination/eth2016/about/
>>>>>
>>>>> This is not the first time that there has been an End of Term Web harvest. See previous ones at:
>>>>>
>>>>> http://eotarchive.cdlib.org/
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Neal McBurnett http://neal.mcburnett.org/
>>>>> _______________________________________________
>>>>> Bessig mailing list
>>>>> Bessig at lists.esipfed.org
>>>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bessig mailing list
>>>>> Bessig at lists.esipfed.org
>>>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dan Ziskin, PhD
>>>>> NCAR - Atmospheric Chemistry Observations & Modeling Laboratory
>>>>> MOPITT Data Manager
>>>>> 303-497-2913
>>>>>
>>>>>
>>>>
>>>>> _______________________________________________
>>>>> Bessig mailing list
>>>>> Bessig at lists.esipfed.org
>>>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>>>
>>>> _______________________________________________
>>>> Bessig mailing list
>>>> Bessig at lists.esipfed.org
>>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>>>
>>>>
>>>> _______________________________________________
>>>> Bessig mailing list
>>>> Bessig at lists.esipfed.org
>>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>> _______________________________________________
>>> Bessig mailing list
>>> Bessig at lists.esipfed.org
>>> http://lists.deltaforce.net/mailman/listinfo/bessig
>>
>> _______________________________________________
>> Bessig mailing list
>> Bessig at lists.esipfed.org
>> http://lists.deltaforce.net/mailman/listinfo/bessig
More information about the Bessig
mailing list