[Esip-preserve] Folksonomies

Tom Moritz tom.moritz at gmail.com
Tue May 24 15:34:40 EDT 2011


Hi Bruce --

Yes the paper I sent along was really just a "note" about some testing/
measurement that we were doing
in assessing the problem -- within biology -- of pushing the limits of the
UMLS/ bio-medical structure
toward a more inclusive "all-biology" approach...

I reviewed the Shirky blog when you sent it out -- thanks for that -- and
the Stanford paper is very helpful...
I believe that they are right in proposing examining the values of
"consistency, quality, and completeness"
as a standards for indexing but, in real-time practice, the problem is
really the efficiency and practicability
of attaining these values...?  (I have evolved the slogan -- *cataloging /
indexing is a process not an event
*( with due respect to your previous post on definitions of "event"!)*
*
Of course, there is a fundamental problem with cataloging/ indexing/tagging
--
I recall a study reported many years ago that said that the same indexer --
indexing the same
material and using controlled vocabularies would vary about 33% of the
time...  This would seem to argue for
use of more indexers/ taggers as compensation?  Seems to me that in a social
tagging context -- we rely on an expanded "crowd"
of taggers for "consistency" and "completeness" and we expect "quality" to
emerge from this crowd's
common wisdom...? (The Santa Fe Institute has looked at this question of
"average wisdom"???)

My own experience/ thinking on this problem seems to coincide well
with the Stanford experience/ analysis?  I began talking about 5-6 years ago
about
"qualified social tagging"  -- intending that "qualified" would have a
double meaning of
1) application of experts to tagging ("expert" being determined by some
more-or-less
rigorous criteria) and 2) "qualified" in the sense that *both* free text and
controlled vocabularies might
be available for application...

I have generally expected that systems might evolve that would tolerate *
both* "expert" and "social" (open)
taggers using both free text and controlled vocabularies... Permitting
partitioned or combined use of such tags
(ie I could as a searcher express a preference for "expert" tagging")

The "library approach" to tagging knowledge resources on the Web has seemed
untenable to me for many years --
this occurred to me ca. 1999-2000 when -- working at AMNH under a very large
Mellon grant --
I began to ponder the gap between the numbers of potential digital objects
on the Web
and the per-unit costs of doing conventional library cataloging... (A single
piece of
"original MARC cataloging" at AMNH Library had an estimated cost of $15/
record -- the estimate was
given to me by our senior cataloger... I also aware of an instance of a
single EAD [Encoded Archival Description
SEE: http://www.loc.gov/ead/]  record for an archival collection
that ran to 600 pages and required two years to produce...!  As a scholarly
research effort this is
actually defensible -- but again -- this does not scale well to the need for
immediacy of processing & access...!)

In this context, I just picked up from the Chronicle of Higher
Education: "Gaming
the Archives" May 23, 2011, 5:13 pm by Jennifer Howard
http://chronicle.com/blogs/wiredcampus/gaming-the-archives/31435?sid=wc&utm_source=wc&utm_medium=en]
  <http://chronicle.com/blogs/wiredcampus/author/jhoward/>
One "repair" to the library approach that could be more efficient is more
effective use of inference and recursion
in generating tags minimally necessary for discovery... With subsequent
reliance on "in stream" tagging as use occurs...

Much to say about all of this -- and structuring the discussion in some
systematic ways seems
necessary?

SEE Also: Heymann, Paul and Garcia-Molina, Hector (2009) *Contrasting
Controlled Vocabulary and Tagging: Do Experts Choose the Right Names to
Label the Wrong Things?* In: Second ACM International Conference on Web
Search and Data Mining (WSDM 2009), Late Breaking Results Session, February
9-13, 2009, Barcelona, Spain. http://ilpubs.stanford.edu:8090/955/

Tom

*Tom Moritz
1968 1/2 South Shenandoah Street,
Los Angeles, California 90034-1208  USA
+1 310 963 0199 (cell) [GMT -8]
tommoritz (Skype)
http://www.linkedin.com/in/tmoritz*

“Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
Heraclitus *
"It is . . . easy to be certain. One has only to be sufficiently vague." --
C.S. Peirce
*"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus happy.")
-- Camus
*

     Please consider the environment before printing this e-mail




On Tue, May 24, 2011 at 8:43 AM, Bruce Barkstrom <brbarkstrom at gmail.com>wrote:

> I had a chance to sort of skim the paper you sent.  I think my skepticism
> about
> ontology exercises goes much deeper.  Clay Shirky covers the argument
> fairly
> well in the attached web page; the other paper does a very nice job of
> providing
> a concise description of the library sciences approach - as well as some
> empirical
> evidence regarding data user tagging efforts.  There is also the work of
> Furnas
> and collegues at Xerox PARC which has always struck me as saying the
> professional catalogers don't capture the mental models of data users.
>
> Certainly medical terminologies are highly developed and used by experts in
> the disciplines.  Indeed, they are one of the key tools used in diagnosing
> diseases.  The question is whether that community's experience applies to
> Earth science, where much of the scientific work does not involve nearly as
> much classification effort - except, perhaps, in areas of biodiversity.
>
> I should also note that I've got samples of vocabularies used by NASA's
> Global Change Master Directory and the Climate and Forecasting Conventions
> developed by UCAR, as well as the WMO's nomenclature.  Even after removing
> case sensitivity, the number of exact matches on two sets of about 1000
> terms each is about five.  There are two sets of Essential Climate
> Variables
> (about 100 terms each) that show a similar degree of similarity.  Putting
> these
> lists together was not an encouraging exercise - at least personally.
>
> Bruce B.
>
> On Mon, May 23, 2011 at 3:53 PM, Tom Moritz <tom.moritz at gmail.com> wrote:
>
>> Hi Bruce --
>> Been considering your recent messages and thought I'd send along a draft
>>  article (we never submitted it for publ...) from a few years ago when
>> I was still at AMNH in NY...
>>
>> We were grappling with the problem of how to render specialist
>> vocabularies
>> interoperable...  So (attached) for what it's worth... Perhap, food for
>> thought at least...?
>>
>> UMLS (NLM) is still to my understanding one of the most highly developed
>> and refined ontology projects...
>>
>> (In the mid 90's, I compiled the IUCN "Conservation Thesaurus"  and sought
>> to integrate it -- at a high level -- with a series of other thesauruses
>> -- including
>> UNBIS (the United Nations) , the OECD Macrothesaurus and INFOTERRA...
>> (UNEP)
>> -- this was just an early experimental effort...
>>
>> Tom
>>
>


>> *Tom Moritz
>> 1968 1/2 South Shenandoah Street,
>> Los Angeles, California 90034-1208  USA
>> +1 310 963 0199 (cell) [GMT -8]
>> tommoritz (Skype)
>> http://www.linkedin.com/in/tmoritz*
>>
>> “Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
>> Heraclitus *
>> "It is . . . easy to be certain. One has only to be sufficiently vague."
>> -- C.S. Peirce
>> *"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus
>> happy.") -- Camus
>> *
>>
>>       Please consider the environment before printing this e-mail
>>
>>
>>
>>
>>
>>   On Wed, May 18, 2011 at 10:41 AM, Bruce Barkstrom <
>> brbarkstrom at gmail.com> wrote:
>>
>>>  The new issue of Computer has an article by Sen, S. and Riedl, J.
>>> on Folksonomy Formation that is rather interesting.  They mention
>>> an web blog http://www.shirky.com/writings/ontology_overrated.html
>>> that's an opinion piece on ontologies versus user-based search
>>> mechanisms.  Has anybody else picked up on this thread of
>>> conversation?
>>>
>>> Bruce B.
>>> _______________________________________________
>>> Esip-preserve mailing list
>>> Esip-preserve at lists.esipfed.org
>>> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20110524/7fcf3629/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tree.gif
Type: image/gif
Size: 278 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20110524/7fcf3629/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tree.gif
Type: image/gif
Size: 278 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20110524/7fcf3629/attachment-0003.gif>


More information about the Esip-preserve mailing list