[Esip-preserve] A Summary of Yesterday's discussion on Data Sets
Mark A. Parsons
parsonsm at nsidc.org
Fri Feb 17 17:06:35 EST 2012
From the OED:
data set, n.
Pronunciation: Brit. /ˈdeɪtə sɛt/ , /ˈdɑːtə sɛt/ , U.S. /ˈdædə ˌsɛt/ , /ˈdeɪdə ˌsɛt/
Etymology: < the plural of datum n. + set n.2
1. A collection of related data, esp. when handled as a unit; (Computing) a named data file.
1958 Western Polit. Q. 11 499 Since three data-sets were utilized, guide-lines‥were defined in terms of the proportion of three cases in which a certain correspondence was observed.
1961 H. D. Leeds & G. M. Weinberg Computer Programming Fund. iii. 75 We thus have a control count of data sets.
1982 Computerworld 20 Dec. 29/2 C-Star is said to list tape volume of contents including serial number, owner, data set names and file qualities.
1988 Computer Graphics World 11/1 It is quite easy to compress, exactly, any reasonable data set by 2:1.
1992 Pixel Mar.–Apr. 22/1 Volume rendering‥involved assigning a color and an opacity to each point in a dataset.
2003 M. Kraak & F. Ormeling Cartogr. (ed. 2) v. 94/1 If one realizes that almost all countries have their own national grids and adhere to different hierarchies in their administrative units it is not that straightforward to match them into a single data set.
datum, n.
Pronunciation: /ˈdeɪtəm/
Forms: Pl. data /ˈdeɪtə/ .
Etymology: < Latin datum given, that which is given, neuter past participle of dare to give.(Show Less)
1.
a. A thing given or granted; something known or assumed as fact, and made the basis of reasoning or calculation; an assumption or premiss from which inferences are drawn.
1646—1888
b. Philos. datum of consciousness, etc. (see quots.). Esp. datum of sense (cf. sense-datum n.).
1846—1902(Show quotations)
c. pl. The quantities, characters, or symbols on which operations are performed by computers and other automatic equipment, and which may be stored or transmitted in the form of electrical signals, records on magnetic tape or punched cards, etc.
1946—1970(Show quotations)
2. In pl. Facts, esp. numerical facts, collected together for reference or information.
1899—1971(Show quotations)
3. Used in pl. form with sing. construction.
1807—1971(Show quotations)
On 17 Feb 2012, at 2:45 PM, Bruce Barkstrom wrote:
> I think yesterday's discussion was useful. Here's an attempt to
> capture some of it in the form of a dictionary where each term
> has several definitions. I do not think we need to try to develop
> a single "consensus" definition for these terms. Rather, this
> approach seeks to provide a reflection of the very different
> mental models present in the group, as well as in the data
> producer and user communities.
>
> Bruce B.
>
> A Dictionary for Terms Related to Data Sets
>
> Introduction
>
> Although the term `data set' is widely used in writings describing
> collections of
> Earth science data, it is difficult to find a clear definition. For
> example, the
> Open Archive Information System (OAIS) Reference Model (RM) does not include the
> term `data set' in its list of defined terms, although it uses it in
> that document's
> Appendix A. The ISO 19115 standard notes that ``the definition of what
> constitutes a `dataset' is more problematic and reflects the institutional and
> software environments of the originating organization.'' [Appendix G, p.~119]
>
> The ESIP Federation Cluster on Data Preservation and Stewardship discussed the
> meaning of this term fairly extensively. As a result, it was clear that there
> were a variety of different meanings to this term. In attempt to clarify the
> possible uses of the term and show its ambiguity, the group considered using
> the standard dictionary approach. That is, rather than trying to
> present a single
> definition, a dictionary presents a numbered list of alternative meanings. In
> addition, it seemed useful to present examples of each alternative's use.
> I didn't catch enough of those to be useful, so I've just put in the
> definitions.
>
> Data Set Definitions
>
> Data Set:
> 1. A logical collection of data
> 2. A granule
> 3. A collection of granules
> 4. A relational database
> 5. A collection of data values
> 6. A file containing data
> 7. A collection of files containing data
>
> Ancillary Definitions and Notes
>
> A. The term `Data' is not defined above. It may be useful to be more precise.
>
> Data:
> 1. A collection of datum values (noting that one unabridged dictionary
> says data is the plural of datum)
> 2. A datum is a numerical value for a measurand or a character string
> identifying
> a biological or geological specimen.
>
> B. The term `Granule' is not defined above. This term has a fairly long and
> perhaps obscure history.
>
> Granule:
> 1. A term used to identify an inventory entry for a file in a data
> archive's catalog
> (based on an informal recollection of the use of this term in the early phases
> of NASA's EOSDIS design, when the system's designer's wanted the inventory item
> to remain defined even if tape storage devices fragmented the file by placing
> it on different tapes).
> 1a. A somewhat broader definition would allow the inventory entry to include
> metadata and documentation.
> 2. A collection of data, metadata, and documentation roughly equivalent to
> the OAIS RM's notion of a Dissemination Information Package.
>
> C. The term `Metadata' is also overloaded.
>
> Metadata:
> 1. Data about data.
> 2. A collection of records organized in a fashion appropriate for storage in
> a relational database. Each metadata record contains fields. In the OAIS RM,
> metadata is typically classified as Representation, Provenance, Context, or
> Fixity.
> 3. A collection of records (as in definition 2) together with digital
> or written
> documents intended for communicating humanly understandable
> information, particularly
> about the provenance and context of a data collection.
>
> D. The term `File' is also not easily definable. Wikipedia's
> articles on the term
> `Computer Files' and `File Names' are helpful. Knuth's circular
> definition indicates
> some of the ambiguity: ``The collection of all records is called a
> `table' or `file',
> where the word `table' is used to indicate a small file and `file' to indicate a
> large table. A large file or a group of files is frequently called a
> `database.' ''
> [Knuth, D. E., 1998: The Art of Computer Programming: Volume 3,
> Sorting and Searching,
> Second Edition, Addison-Wesley, Boston, MA]
>
> File:
> 1. A computer file is a block of arbitrary information, or resource
> for storing
> information, which is available to a computer program and is usually
> based on some
> kind of durable storage. [Wikipedia, article on Computer Files]
> 2. At the lowest level (corresponding to the Bit-Stream Level in Annex E of the
> OAIS RM), many modern operating systems consider files simply as a
> one-dimensional
> array of bits. [Wikipedia, article on Computer Files - but modified
> to make their
> term `sequence of bytes' read `array of bits' and conflate meanings
> with the OAIS RM.]
> 3 At a higher level, where the content of the file is being
> considered, these binary
> digits may represent integer values, text characters, image pixels,
> audio or anything else.
> It is up to the program using the file to understand the meaning and
> internal layout
> of information in the file and present it to a user as more meaningful
> information
> (like text, images, sounds, or executable application programs).
> [Wikipedia, article
> on Computer Files] Note that thinking of the file as an array of
> `higher level' data
> elements corresponds with the Aggregation Layer in Annex E of the OAIS
> RM, but makes
> the reading and writing more complex because the data elements no
> longer have the same
> size.
> _______________________________________________
> Esip-preserve mailing list
> Esip-preserve at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
More information about the Esip-preserve
mailing list