[Esip-preserve] Collection Structure Telecon on Monday August 19 at 2 pm EDT

Bruce Barkstrom brbarkstrom at gmail.com
Mon Aug 12 11:43:43 EDT 2013


For next Monday's telecon (Aug. 19, 2 pm EDT), I'd like to
continue the procedure we've had by starting with reports
on activities by each participant.  Participants should limit
their reports to five minutes.  If we are deluged by many new
participants, we may need to shorten the individual reports or
develop some new procedures.  We'll get out the specific
information on connecting in the next day or two.

In my report, I'll summarize some work I did about 15 years
ago.  This work used a number of data sources to develop
quantitative estimates of the behavior of users working with
scientific data.  I'll share some insights from that work,
although there is too much material for me to cover in a
five minute report.

That work suggests that there are two broad user communities
for Earth science data:
   - school children and the public
   - professionally educated scientists, engineers, and
      managers, as well as science prone individuals.
Quantitatively, the first category probably has about 95% to 99%
of the potential users who would access sites with Earth science
data.  The second category is likely to be much more demanding.
They are likely to request most of the distributed data volume.
These two categories of users are unlikely to have the same
vocabularies.  They also are unlikely to have the same
mathematical and scientific competencies.  As a result, there is
likely to be substantial semantic heterogeneity between the two
communities.

A working hypothesis is that metadata designed for the first
community is unlikely to serve the needs of the second well.
Professionals seem likely to search for information based
on their professional scientific vocabulary.  Data producers
who make the original measurements in Earth sciences
have to be scientific professionals.   The architecture
of their collections is likely to influence the expectations
and search strategies of other members of their community.
Based on our previous work on collections, professional users
appear likely to want information about collections.  Having
only data from individual items may not meet their needs.

One expectation from this hypothesis concerns the ranking systems
for ordering query results sets.  Ranking systems usually use
statistics developed from a large population of users.  Most of
these statistics will come from the lay user community.  However,
those users seem unlikely to reflect the rankings appropriate to
professional users.  This bias seems likely to reduce the
probability of having useful search items for professionals
appear near the top of the list.  This hypothesis suggests that
professional users will have to examine longer lists of items
than lay users.  The lists will have more professionally
irrelevant items than the lists for lay data users.  This
hypothesis suggests that members of the professional community
would lose efficiency in their research.  They would have to do
more work to evaluate more items in the results sets.  They are
also less likely to obtain useful data.

It would be useful to focus our discussions onto possible ways of
verifying whether this hypothesis is reasonable.  If it isn't,
then replacing it with a better one would be a positive step.
If it is, then we should discuss how to deal with its implications.
Our bimonthly telecons provide a very limited time
for discussions.  Thus, we may want some sessions at the winter
ESIP meeting to deal this topic and related ones.

Bruce B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20130812/cf3a96c8/attachment.html>


More information about the Esip-preserve mailing list