[esip-semantictech] NASA GCMD Keywords Version 9.0 Released (2019-11-12)

Mark Leggott mark.leggott at rdc-drc.ca
Mon Jan 13 17:05:06 EST 2020


I understand that the UDFR is no longer maintained, and the project page itself recommends PRONOM. 

 

There is a nascent effort to create a newer system with a more distributed architecture, and one that could potentially accommodate the 10s of thousands of unique research data file formats.

 
http://parcore.org/
https://github.com/JiscSD/rdss-par
 

I had hoped this would evolve into a Digital Objects Action Registry (DOAR), so providing information beyond filetypes and preservation actions, but I’m not sure where the project is heading at this time.

 

Another interesting variation on this idea is DataSeer, which adds an AI element:

 
https://github.com/kermitt2/dataseer-ml
 

Even with these efforts, I think the opportunity to respond to Matt’s suggestion of a global service like this for research data outputs is still waiting for a more cohesive and comprehensive approach. A DOAR service could ultimately provide information about preservation, analysis, transformation, and just about any kind of service you can think of that acts on a known file type. Until we do this, projects like DataONE will indeed continue to reinvent the digital wheel. Cheers,

 

Mark

 

 

Mark Leggott, Executive Director / Directeur exécutif

Research Data Canada / Données de recherche Canada

45 O’Connor Street, Suite 500  Ottawa, ON K1P 1A4  Canada

w – rdc-drc.ca  t – 613.220.7236  f – 613.943.5443  e – mark.leggott at rdc-drc.ca 

Skype – markleggott  Zoom – 391-053-1054/mark.leggott at rdc-drc.ca

Twitter – @mleggott/#rdcdrc  LinkedIn – ca.linkedin.com/in/markleggott

Book a Meeting with Me - https://rdc-drc.doodle.com/markleggott

 

 

 

From: esip-semanticweb <esip-semanticweb-bounces at lists.esipfed.org> on behalf of Matt Jones via esip-semanticweb <esip-semanticweb at lists.esipfed.org>
Reply-To: Matt Jones <jones at nceas.ucsb.edu>
Date: Monday, January 13, 2020 at 14:40
To: John Scialdone <jscialdo at ciesin.columbia.edu>
Cc: ESIP Semantic Web Cluster <esip-semanticweb at lists.esipfed.org>, Shannon Leslie <shannon.leslie at nsidc.org>
Subject: Re: [esip-semantictech] NASA GCMD Keywords Version 9.0 Released (2019-11-12)

 

SiriJodha et al.,

 

Members of the DataONE network use a shared format list that is used to tag granules with the serialization format. It might be useful to you in this effort. 

 

Each format includes a format identifier, a human readable name, indication whether it is primarily used to serialize metadata or data granules, and its associated mime media type and extensions used.  When it makes sense, we use the mime-type as the formatId, but in many cases one mime-type corresponds to many formats or different versions of formats, and so in those cases we use something else that is reasonable.  Here is HDF5 as an example:


    <objectFormat>
        <formatId>application/x-hdf5</formatId>
        <formatName>Hierarchical Data Format version 5 (HDF5)</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/x-hdf5"/>
        <extension>h5</extension>
    </objectFormat>

 

We extend this list as needed when new formats are encountered, and the current list can be retrieved from the DataONE formats service, which is at https://cn.dataone.org/cn/v2/formats . I've also attached a copy of the current list in case it is useful.

 

There have been other format list standardization efforts, including most recently the Unified Digital Format Registry (UDFR), run by the California digital library. That is an ontological format registry that harmonizes prior vocabularies, including the earlier PRONOM and GDFR (the Global Digital Format Registry) efforts. Although we haven't adopted it yet, we do try to be sure we are consistent with UDFR at DataONE.  We'd love to see a global service like this be adopted, rather than doing it agency by agency.

 

Matt

 

 

Matthew B. Jones

ORCID: 0000-0003-0077-4738

Director of Informatics R&D, National Center for Ecological Analysis and Synthesis

PI, NSF Arctic Data Center

Director, DataONE program 

University of California Santa Barbara

 

 

On Mon, Jan 13, 2020 at 2:07 PM John Scialdone via esip-semanticweb <esip-semanticweb at lists.esipfed.org> wrote:

SiriJodha,

Yes, we can contribute here, request edit permission.

Thanx..
John

On 1/13/2020 8:15 AM, Siri Jodha Khalsa wrote:

Hi John,

 

I agree, it would be good to have the DAACs compile their list of formats. Ideally, there would be a shared spreadsheet, where each DAAC would check against what was already there, to avoid have the same format called different things.

 

I've taken the GCMD list and assigned encodings to all that I could identify.  The categories are ASCII, Binary, Image, Library (i.e. associated with a software library like HDF), and Proprietary (which is somewhat of a mixed bag, could be ASCII or Binary, open or closed).

 

The spreadsheet is here: https://docs.google.com/spreadsheets/d/1Lt7hl-_NKbp37FZkQ870b9c9LhlS39ZMq1N-UZdPUp8/edit?usp=sharing

Feedback, corrections, additions welcome. I'll give edit permission as requested.  

 

Cheers,

SiriJodha

 

On 1/11/20 12:04 AM, John Scialdone wrote:

Siri Jodha,

We've been kicking around the the Data Format Controlled Vocabulary list as well. We had a call with Valerie, Tyler and Scott recently about inconsistencies in this list. One of the goals of this list (from ARC team review of our metadata) was to help users understand the software needed to read/use the data. We suggested to add a field to this structure whereby values such as "ESRI", "Microsoft", "QGIS", "Adobe", "Google" etc. could be associated with a format. I think it would be a good exercise for all the DAACs to generate a list of formats they use and associated s/w, then bring them together over some telecons, face to face meetings, and thru tracking this effort via the Earthdata wiki, to eventually help generate a more well-thought-out list.

Thanx..
John

On 1/10/2020 4:08 PM, Siri Jodha Khalsa via esip-semanticweb wrote:

I'm curious whether the ESIP semantic community has any opinions on these two controlled vocabularies. 

One question I have is why a measurement keyword list was necessary when GCMD already has the science keywords (a source for the original SWEET). i.e. why not integrate measurements (which are represented as variables in the science keywords) into the science keywords?

 

the data format list is even more perplexing to me. "incidence angle file" is a format? Georeferenced TIFF in addition to GeoTIFF?  DV? (digital value?) is a format?  KML as well as OGC KML? ASCII and text (what about unicode?) DEM, if this refers to digital elevation models with a .DEM extension from USGS, are ASCII files. 

 

Many formats are subtypes of other formats listed. Wouldn't a better approach be to list the encodings (a much smaller list, which would tell users how to read the data with software) and then add the conventions that have been applied (e.g. CF for netCDF or GRIB for binary). For the encodings list, why not start with mime types?

 

sjs

 

On 11/12/19 5:00 PM, Stevens, Tyler B. (GSFC-423.0)[Stinger Ghaffarian Technologies] via esip-semanticweb wrote:

The NASA Global Change Master Directory (GCMD) staff is pleased to announce the release of the GCMD keywords version 9.0. Version 9.0 consists of two new keyword schemes: (1) Measurement Name and (2) Granule Data Format. 

 

The Measurement Name list is a preliminary set of (~100) keywords that represent an observable property, usually geophysical, geo-biophysical, physical, or chemical. The Granule Data Format list of keywords represent the format of the data that is distributed by the data center. 

The keywords help facilitate the classification and discovery of Earth Science data by providing a rich vocabulary for characterizing the data. The GCMD keywords are used by hundreds of data providers worldwide for categorizing the ~33,000 records stored in the Common Metadata Repository.

For more information about the keywords and how to access them, please visit the Keyword Landing Page. Questions about the keywords can be submitted to support at earthdata.nasa.gov or directed to Valerie Dixon at valerie.dixon at nasa.gov. 

_______________________________________________
esip-semanticweb mailing list
esip-semanticweb at lists.esipfed.org
https://lists.esipfed.org/mailman/listinfo/esip-semanticweb
-- 
Siri-Jodha Singh KHALSA, Ph.D., SMIEEE
National Snow and Ice Data Center
University of Colorado
Boulder, CO 80309-0449 Phone: 1-303-492-1445 GV: 1-303-736-9976
http://cires.colorado.edu/~khalsa
http://orcid.org/0000-0001-9217-5550


_______________________________________________
esip-semanticweb mailing list
esip-semanticweb at lists.esipfed.org
https://lists.esipfed.org/mailman/listinfo/esip-semanticweb
 

-- 
John Scialdone
Manager, Data Center Services: NASA Socioeconomic Data and Applications Center (SEDAC)
Project Lead: Jamaica Bay Research and Management Information Network (JBRMIN)
Project Lead: Jamaica Bay & Sandy Hook BioBlitz Events
--------------------------------------------------------------------------------------
Center for International Earth Science Information Network (CIESIN)
Earth Institute @ Columbia University
Lamont-Doherty Earth Observatory (LDEO)
61 Route 9W, PO Box 1000, Palisades, New York 10964 USA
Phone: (845) 365-8978; FAX: (845) 365-8922
Email: jscialdo at ciesin.columbia.edu; jns74 at columbia.edu
CIESIN web site: www.ciesin.columbia.edu
SEDAC web site: sedac.ciesin.columbia.edu
JBRMIN web site: www.ciesin.columbia.edu/jamaicabay
Sandy Hook web site: bioblitz17.ciesin.columbia.edu

Follow us on:
Twitter | Facebook | Youtube 
-- 
Siri-Jodha Singh KHALSA, Ph.D., SMIEEE
National Snow and Ice Data Center
University of Colorado
Boulder, CO 80309-0449 Phone: 1-303-492-1445 GV: 1-303-736-9976
http://cires.colorado.edu/~khalsa
http://orcid.org/0000-0001-9217-5550
 

-- 
John Scialdone
Manager, Data Center Services: NASA Socioeconomic Data and Applications Center (SEDAC)
Project Lead: Jamaica Bay Research and Management Information Network (JBRMIN)
Project Lead: Jamaica Bay & Sandy Hook BioBlitz Events
--------------------------------------------------------------------------------------
Center for International Earth Science Information Network (CIESIN)
Earth Institute @ Columbia University
Lamont-Doherty Earth Observatory (LDEO)
61 Route 9W, PO Box 1000, Palisades, New York 10964 USA
Phone: (845) 365-8978; FAX: (845) 365-8922
Email: jscialdo at ciesin.columbia.edu; jns74 at columbia.edu
CIESIN web site: www.ciesin.columbia.edu
SEDAC web site: sedac.ciesin.columbia.edu
JBRMIN web site: www.ciesin.columbia.edu/jamaicabay
Sandy Hook web site: bioblitz17.ciesin.columbia.edu

Follow us on:
Twitter | Facebook | Youtube 

_______________________________________________
esip-semanticweb mailing list
esip-semanticweb at lists.esipfed.org
https://lists.esipfed.org/mailman/listinfo/esip-semanticweb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-semanticweb/attachments/20200113/cb1dcc29/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4336 bytes
Desc: not available
URL: <http://lists.esipfed.org/pipermail/esip-semanticweb/attachments/20200113/cb1dcc29/attachment-0001.p7s>


More information about the esip-semanticweb mailing list