[Esip-preserve] Identifiers

Sat Feb 18 03:10:43 EST 2012

Mark A. Parsons wrote:
> I don't think there is a falsifiable definition of data set. Or rather all definitions are false. It's very situational.

Agreed.  To put it another way, I think this attempt to define "dataset" is doomed because a dataset is a cognitive construct, and cognitive constructs do not have exact definitions and hard boundaries, but look more like overlapping categories that are characterized by exemplars and degrees of membership.

Is there a *functional* reason why we need to define terms like "dataset" and "granule"?  I guess a necessary (but not sufficient) condition for me to be convinced by any definitions for "dataset" and "granule" is that there is some kind of functional difference between them; some different functional affordances.

From the old Alexandria days I recall a passionate debate over what constituted a "title".  (That may sound quaint now, but I assure you, a librarian armed with an AACR2 reference is a formidable adversary.)  What cut through that particular Gordian knot was looking at the question purely functionally: we only care about titles to the extent that we do something with them.  And the answer at that time was, all we do with titles is display them in search result lists.  Ergo, a "title" is that which you want to see displayed as a search result, no more, no less.  Corollary: a title should be about one line wide when displayed in a typical font size.

Regarding data and citation, from a functional perspective I would say that if a particular entity has an identifier, and can be independently referenced (or is independently actionable), and if the entity's provider is committed to maintaining that entity and its identifier and its independent referencability, then the entity is "citable".  Notice that this definition is independent of both the size of the entity and the terminology the provider uses in referring to it.

-Greg