[Esip-preserve] A Note on an UML Model for Use Cases - 1

Sun Apr 10 16:24:00 EDT 2011

[I sent out this material Friday, than sent out a new version to a
different e-mail
list today.  Curt's GSFC security said my zipped files were too big.  I'll try
Dropbox (which Curt suggested) if this doesn't work.  For now, I'll send this
material out in separate e-mail messages that I hope have some chance
of getting through.  I probably used an old and obsolete e-mail address.  If
you get two (or more) copies, I apologize.]

Following the GEODATA 2011 meeting in Broomfield, I started working on use
cases that are related to the NSIDC Glacier Photo Collection.  It was my sense
that Unified Modeling Language (UML) was a reasonable approach to documenting
the model - and I had a pretty good, free tool for creating it that
I'd worked with
before.  [If you're interested, you can get it off the web by putting
the term 'BOUML'
in Google.  Professional UML tools, like Rational Rose, can be pretty expensive.
I don't think it's worth getting into a "tool war" on this, BTW.]

The tool creates a Web site for the model, which is contained in the
`Glacier_Photo_Collection.zip' file.  The attached `Readme.txt' contains
some instructions on how to unzip the file and some suggestions on how
to navigate through the Web site.  Among the various languages BOUML
can produce is XMI.  I've included a copy of that in the file
`Glacier_Photo_Collection_XMI.zip', so those of you who want an
XML version of the items in the UML can get a head start.  I'll note
that the names I'm using for objects in the UML model use an underscore,
not CamelCase.  A bit over a year ago, Ruth had provided me with a spreadsheet
containing metadata records for about 10,000 of the photographs (which is
about 5% of the collection of 200,000 digital photographs).  I asked Ruth
about distributing this and she had no problem with doing so.  There are
two versions of the spreadsheet in the file `Spreadsheet.zip': one that has
column headings and is the original, I believe.  The second one has permuted
the order of the columns and doesn't have the labels in the first row of the
spreadsheet.  I'd used that for some of my work that explored the structure
of the image collection.

There are a number of reasons for thinking this collection of use case scenarios
would fit the WG's needs fairly well:
1.  The glacier digital photo collection forms the basis for the
metadata testbed that we've been working on
2.  The collection is big enough to avoid the mental model that goes with the
single investigator, 'small collection' notions, yet it's small enough
to avoid the complexities involved in dealing with the really large
scale data production
approaches associated with NASA's EOS or NOAA's operational or archival
systems.
3.  It's intended to provide concrete objects that are fairly unambiguous so
we can explore the semantic heterogeneity of the community by asking for
opinions on how to categorize the objects, rather than assuming that the
objects conform to some preexisting cataloging or definitional scheme.

I sent out an e-mail like this to Ruth, to Ray Hook, and to Ruth's archivist.
Ray thought it could provide a good basis for use cases and sounded optimistic
that it might be extensible in some interesting ways.  He encouraged me
to provide the material to the cluster.  I've not heard from Ruth or
her archivist,
so I don't know her opinion.

I think the UML model has enough specificity that it can be used to raise
some interesting questions, such as "Given the social network that's
approximated by the UML actors, which selection of individuals and
organizations should receive citation credit for a data collection?",
"Who's the `publisher' of the collection?", "Is it more useful to give a
citation date based on the observation date and time or on the `publication'
date?", and so on.

Perhaps we can discuss what to do with this material at the next telecon.

Bruce B.
-------------- next part --------------
There are five files in this e-mailing.  Together, they should contain a fairly detailed start on
use cases for a Digital Glacier Photo Collection modelled on the one at NSIDC.  I had to break the
files for the UML model into pieces in order to deal with some of the servers that may receive it.

GPC_V1.zip - the first part of a collection of files that contains a Web site with the current state
   of a UML model describing four use case scenarios for such a site

GPC_V2.zip - the second part of the UML model for the site.

Glacier_Photo_Collection_XMI.zip - a zipped file (from 7-Zip) that contains an XMI document that translates 
   the Web site UML into XMI

Spreadsheet.zip - a zipped file (from 7-Zip) that contains two, slightly different versions of spreadsheets 
   that contain metadata records for about 10,000 glacier photos.  Both spredsheets were
   created from Open Office, hopefuly formatting them into eXcel.

Readme_1.txt - this file.

To unpack this mateial, you'll need about 6 MB of spare disk space.  I'd recommend unzipping the
GPC_V1.zip and GPC_V2.zip into one directory and putting the XMI, spreadsheets, and this file in an adjacent one.

------- Accessing the UML model ---------------------------------------------------------------------

-- Top Page

To access the Glacier_Photo_Collection, you can point your browser at either index.html or index-withframe.html.
Personally, I prefer the latter, since the frame approach does help with the search.  When you open the frame
index, the very top line of the upper right window contains the top-level menu, with the items

-Top--Classes--Packages--Use Cases--Class Diagrams--Use Case Diagrams

These are clickable links to various entry points in the web site.  The line below this

1 2 3 4 A B C D E F G H L O P S T V

with another set of entry points - this one alphabetic and mainly various use cases or class definitions.

The left-hand panel of this top page is an index to the various classes.  Again, these items are clickable links.

-- A Note on Class and Diagram Naming Conventions ---------------------------------------------------

As I was developing this model, it was clear that when the BOUML tool created the web site, it alphabetized
the entries in the html.  To avoid losing structure, I put the use case (or scenario) number first, followed
by a secondary letter indicating which of the individual use cases were in the first container.  This means 
that if you look at the Class labeled 1A_Barometer, you'll find that in use case 1A,
which also appears in a use case diagram labelled 1A_Create_Latent_Image_Use_Case_Diagram.  If you click on the
class link 1A_Barometer, you'll see a lower right frame with a description of the physical object "Barometer"
and its three attributes.  Of course, since this is a web site, you can use the direct web naviation by
clicking. 

-- A Note on Visual Conventions in the Diagrams -----------------------------------------------------

The use case diagrams are relatively straightforward.  The navigation is pretty straightforward as well.
From the index_withframes page, if you click on the Use Case Diagrams link, you'll get text documentation
on the use cases in the scenario.  You'll see that they are laid out from the first use case (which involves
creating the original photographs), through the second (which is the creation of the digital collection from
the negatives or prints of the original photos), the third (which is a scenario in which a teacher assigns
a class report to a high school student), and the fourth (which is a scenario in which a scientific research
team attempts to take about 5% of the digital photos and repurposes them to try to obtain a quantitiative
estimate of the change in glacier area or ice volume).

When you get to the specific use case diagrams, such as 1A_Create_Latent_Image_Use_Case_Diagram, you'll find 
I've color-coded the UML classes (or objects) with the following colors:

- Light Blue - a physical object class (like a Camera)
- Light Yellow - an abstract object class (like Camera Settings or Camera_Position).  These kinds of objects
     are probably not physical or digital in nature, and so are a bit more difficult to characterize or
     make particular
- Gray - a particularly abstract object (like a Glacier_Scene).  I may have only one instance of this
     type of object - although I haven't had a chance to go back and edit the whole site for consistency.
- Pink - applies only to a single use case, indicating that it is not made up of finer detail, but might
     be regarded as a kind of "activity"
- Green - a digital object, usually either a file or a digital document.  You can find examples of this
     kind of object in 1D_Create_Digital_Photo_Use_Case_Diagram.

This color coding should be helpful in rapidly recognizing what kind of object I was thinking about.
It may be that later editing would suggest distinguishing between data files, databases (relational or
otherwise), and digital documents.

-- A Navigational Suggestion ---------------------------------------------------------------------------

I think the Web site will be easier to understand by navigating through the use case diagrams from top
to bottom.  If you start on the top page, the lower right panel will give a text narrative for each
of the use case diagrams, going from scenario 1 to 2 to 3 and to 4.  To some extent the scenarios are
independent - and actually probe different kinds of complications.  So they'll probably make
sense if read them as a narrative, a story in four chapters, if you will.

If you want to visualize the elements that the use case diagrams link together, you can descend to the
diagrams themselves.  In the frames Web set up, there are actually three separate elements that appear
in order when you click on the diagram from the top frame:

- the UML diagram of the use case
- a description of the icon conventions in the diagram
- a procedural description of the steps in each use case.

Thus, if you get to Use Case 1A_Create_Latent_Image_Process, you'll find a numbered procedure that ennumerates
the steps involved in carrying out this use case.  In later revisions, these steps form the basis for a
UML sequence diagram that can identify the details of how various objects and actors interact, exchanging
data and other information, or initiating or responding to each other.  [I haven't had time to do that yet,
but it doesn't take too long to do the diagram using a UML tool.  Documenting the diagram is tedious and
more time consuming, but not too hard.  The only thing now is that there are a fair number of sequence
diagrams that should be added.]

You could even think of this approach as having an expandable outline of the story in each scenario.  You can
read the summary narrative in connection with the use case diagrams.  Where you're interested, you can dip
down to the procedural level to get some of the details and object interactions.

--

An alternative navigation is to go through the use cases in the same order - but now you just get a frame with
the procedural summary for each use case.  If you click on the link, you'll go to the same linking together of
the three items described a couple of paragraphs up.

--

After you've gotten used to these components, you can take a look at the individual classes and their descriptive
material.  Some have attributes; most do not.  Again, that's work to be done.

--

Finally, you might take a look at the alphabetical list of items.  Navigating through these provides a slightly
different view or perspective on how things work.  In a sense, the alphabetical view is rather like a classic
data dictionary.

-- What Remains Undone ------------------------------------------------------------------------------------

There are some important things that are not included.

Probably the most important is dealing with the structure of the data collections.  By this I mean organizing
just the collection of physical objects or digital files into a semblance of meaning.  I'm strongly inclined
to think of this structure as hierarchical, particularly because of a sense that a hierarchical structure makes
it easier to deal with versions.  More on this later.  I need to clear some cobwebs out of my mind on this
because it's pretty abstract and subtle.

The second most important unfinished business is to provide rough data structures for the data product collections
identified in scenario 4.  As these get filled in, I'll also fill in a much more specialized and mathematical 
dialect on the part of the data producers.

Incidentally, there's an incipient collection of concepts embedded in an Open Office spreadsheet, where I've
started to collect material that can provide mental models (concept inventories, vocabulary inventories,
skill inventories, and tool inventories) for various categories of UML actors.

That's probably enough description for now.  The other thing I'm working on is a textual and mathematical
description (rather like an ATBD), particularly for scenario 4, where a description of geometries of imaging,
spatial sampling, radiative transfer, and related details are probably necessary in order to understand the
design of this scenario's production paradigm.

I should add an actual procedure for production in these cases, perhaps showing how project planning
proceeds with Work Breakdown Structures, and so on.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Spreadsheet.zip
Type: application/zip
Size: 368452 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20110410/950aef46/attachment-0001.zip>