[Esip-preserve] Article in New CACM that raises the issue of whether metadata needs to adapt to particular computing communities

Bruce Barkstrom brbarkstrom at gmail.com
Tue Oct 24 14:54:03 EDT 2017


The September issue of Comm. ACM has an interesting article that suggests
that the High
Performance Computing (HPC) environment is very different than the one we
use in dealing
with Web sites that they don't really belong to the same environment:

Peisert, S., 2017: Security in High-Performance Computing Environments,
CACM,
Vol. 60,  No. 9, September, 2017, pp. 72-80.

The subtitle says: "Exploring the many distinctive elements that make
securing HPC systems
much different than securing traditional systems".  The article has four
well-discussed
themes.  "The first theme is that HPC systems are optimized for high
performance by
definition.  Furthermore, they tend to be used for very distinctive
purposes, notably
mathematical computations."  That would seem to apply to many Earth science
data
producers, including some of us who do numerical work on our desktop
stations.

The author notes that "... many, but by no means all HPC systems are often
extremely
open systems from a security standpoint, and may be used by scientists
worldwide whose
identifies have never been validated.  Increasingly, we are also starting
to see HPC systems
in which computation and visualization are more tightly coupled and, a
human manipulates
the inputs to the computation itself in near-real time."  He notes later
[p. 77] that "HPC systems
tend to be used for very distinctive purposes, notably mathematical
computations. ...
The specific application HPC systems varies by the organization that uses
them ..., but
each individual system typically has a very specific use."

He also notes privacy issues, such as [p. 79] "... there may be cases where
the owners
of the data want to keep the raw data for themselves for an extended period
of time, such
as a scientific embargo.  [This can happen in the government, notably the
embargos on
the budget that the executive branch has imposed on announcements of
federal budget
information in the last month or so before it's formally announced.  Not
exactly scientific
data, but NIH has had some interesting excursions into this area as well.]
Or there may
be cases where the owners of the data are unable to share the raw data due
to privacy
regulations, such as on medical data ..."

The issue this raises from my perspective is whether the distinctive
properties
noted with respect to HPC environments require special metadata for this
community,
both in structure and content.  For example, the rules for dealing with
numerical data
are different than those for textual corpora.  Specifically, one might
develop a robust
algorithm for  locating all the occurrences of a value approximating Pi in
a file of numerical data.
Make sure it works whether the data are in a fixed character-based format
or one that
uses double precision floats.

To generalize this issue a bit, do we need to consider creating different
metadata specifications
for different producer and user communities?  I'll grant this makes for
some interesting
governance issues if done in distributed archival systems where data might
be archived
in multiple places to avoid single-point failures like the one that may
happen when some
critical organization falls apart.

>From another standpoint, the stove-piped nature of our funding sources
tends to create
disciplinary communities with different procedures and dialects.  This is
natural enough
when we recognize that different funding agencies have different
priorities.  Weather data
has different producer communities than do ecological communities.  If
funding priorities
funnel money to one community in one agency, other agencies will have
difficulty crossing
the boundaries between them.

These considerations suggest to that it will be very difficult to serve all
communities with
a single, homogeneous, unified metadata description.  I suspect that
developing a common
semantically homogeneous collection is too long a stretch - and will likely
take an extremely
long time to negotiate (like treaties between many nations).  It might be
more sensible
to recognize that ahead of time and not work so hard for a perfect standard.

Bruce B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deltaforce.net/pipermail/esip-preserve/attachments/20171024/8b7587c0/attachment.html>


More information about the Esip-preserve mailing list