[esip-semantictech] [AGENDA] ESIP SemTech Telecon - 2019-05-28
Cox, Simon (L&W, Clayton)
Simon.Cox at csiro.au
Thu May 30 20:55:50 EDT 2019
Thanks for continuing the conversation Blake.
We are clearly in agreement on the main issues.
Responding to a couple of points:
Ø There are clear and pressing issues with storing large, complex geometry data in Literals
I was not assuming that the Literals would be used for storing geometry, only that it is appropriate for data exchange in an RDF context. I don’t think storage is the intention of GeoSPARQL either. There is sometimes an obsession with scrutinizing the serialized form (I’ve seen it a lot in the GI space with XML) as a proxy for thinking about the underlying model or architecture. For example all that ISO metadata and GML was conceived as a serialization of data actually modelled in UML, but when the XML gurus got their teeth into it the interest in the model was pushed into the background. We shouldn’t make the same mistake in the RDF space. Geometry is geometry and needs to be handled with the right technology, which isn’t text … except perhaps as a way to put it on the wire.
Ø When dealing with geographic information, topology cannot be computed from geometry alone
Indeed. As I implied with my comment about cadastre, lots of toplogy is defined by ‘fiat’ and in most cases this should trump geometry computations.
Simon
From: Blake Regalia [mailto:blake.regalia at gmail.com]
Sent: Friday, 31 May, 2019 00:37
To: Cox, Simon (L&W, Clayton) <Simon.Cox at csiro.au>
Cc: Mcgibbney, Lewis J (398M) <lewis.j.mcgibbney at jpl.nasa.gov>; Mike Daniels <daniels at ucar.edu>; esip-semanticweb at lists.esipfed.org; Krzysztof Janowicz <janowicz at ucsb.edu>
Subject: Re: [esip-semantictech] [AGENDA] ESIP SemTech Telecon - 2019-05-28
Please forgive the delayed response -- I am still traveling :) Also, thank you all for engaging in this discussion. I am adding Krzysztof Janowicz in cc.
From time to time there are proposals to encode geometry in RDF, seemingly with the notion that the RDF stack provides the tools necessary to process more or less any data. I’m not so sure.
I think there might be a misunderstanding, and please let me know if I am missing something -- but the idea is to *not* encode geometry in RDF, neither as Triples nor in Literals. NeoGeo proposed the former, GeoSPARQL advises the latter. There are clear and pressing issues with storing large, complex geometry data in Literals with no apparent benefit. While the idea may seem okay on paper, it actually fails in practice. We demonstrated a few of those issues in our workshop paper [1] and have since gathered plenty more feedback from the community about other latent issues, such as (just to name a few) the underlying RDBMS not allowing text literals beyond a few megabytes, or creating named graphs with different levels of geometry simplification in order to make spatial operations in SPARQL queries feasible. Whether or not one views the use of literals as appropriate from a modeling perspective, there are very real technical limitations that prompted us to rethink the need for geometry in RDF, especially since it seems these problems might only become apparent (and quite serious) at scale.
If geometry is available then the topology can be pre-computed for all geometries and then stored. But it should be noted that essentially this is just caching, or an optimization strategy
Thank you for bringing this up because I think it is important to emphasize that pre-computing topology is not simply about caching. When dealing with geographic information, topology cannot be computed from geometry alone. From our latest paper [2]:
[...] we believe that knowledge graphs and Linked Data more concretely will benefit further from topological relations. One could now argue that such topological relations can be computed using geometries but not the other way around. While this is true in an abstract mathematical sense, it does not hold for actual data. In fact, topological relations between places cannot be easily computed based on geometry alone. While there are many reasons for this (Franklin, 1984; Computing and Querying Topological Relations in Linked Geographic Data 3 Ubeda and Egenhofer, 1997), our argument will focus on the role of domain knowledge, vagueness, and uncertainty (Bennett, 2001) and not on computational issues.
- Blake
[1] https://blake-regalia.net/resource/2017-LDOW_Geometries.pdf
[2] https://blake-regalia.net/resource/2019-TGIS_Topology.pdf
On Tue, May 28, 2019 at 10:20 PM Cox, Simon (L&W, Clayton) <Simon.Cox at csiro.au<mailto:Simon.Cox at csiro.au>> wrote:
Hi Blake –
> The main idea is that RDF Literals are not suitable for complex geometry data.
Lets wind this back a bit.
From time to time there are proposals to encode geometry in RDF, seemingly with the notion that the RDF stack provides the tools necessary to process more or less any data. I’m not so sure. RDF is about relationships and logic, and not numerical computation. OTOH, processing geometry is very much about numbers. In particular multi-component quantities (vectors). RDF is weak on the latter. There is a strong case for recognising the boundary between logic and geometry, and apply the appropriate meta-model on each side of the boundary – RDF for logic, and something else for geometry. I’m fine with literals on the geometry side of the boundary.
A part of your argument that I do agree with is that computing topological relationships on-the-fly is a mugs game, for all the reasons that you showed in your presentation. But again, the GIS world has already been here – I think Arc/Info was topological, and only got dumbed down when shapefiles appeared and then didn’t recover topology when ArcGIS came along.
The necessary topological relationships are well known (three flavours are implemented in GeoSPARQL). If geometry is available (I don’t care if it’s in literals or RDF) then the topology can be pre-computed for all geometries and then stored. But it should be noted that essentially this is just caching, or an optimization strategy (though there may also be some cases, e.g. cadastre in many jurisdictions which rely on ‘meets and bounds’, i.e. where the topological relationships come first and geometry must be computed from them).
I’m not at all convinced that focussing on the serialization is the actual issue here.
Simon
From: Blake Regalia [mailto:blake.regalia at gmail.com<mailto:blake.regalia at gmail.com>]
Sent: Wednesday, 29 May, 2019 11:35
To: Cox, Simon (L&W, Clayton) <Simon.Cox at csiro.au<mailto:Simon.Cox at csiro.au>>
Cc: Mcgibbney, Lewis J (398M) <lewis.j.mcgibbney at jpl.nasa.gov<mailto:lewis.j.mcgibbney at jpl.nasa.gov>>; Mike Daniels <daniels at ucar.edu<mailto:daniels at ucar.edu>>; esip-semanticweb at lists.esipfed.org<mailto:esip-semanticweb at lists.esipfed.org>
Subject: Re: [esip-semantictech] [AGENDA] ESIP SemTech Telecon - 2019-05-28
Simon,
Your general approach (which I think is to persist geometry representations outside the context of the feature, and link to them through URIs) makes sense, and I think essentially matches practice in GIS systems for decades now (where geometry was in a separate table).
The main idea is that RDF Literals are not suitable for complex geometry data. Other than that, the data model is nearly the same as GeoSPARQL with some extensions to provide metadata (e.g., attributes such as vertex count, centroid, area, etc.) about the geometries themselves. Finally, we advise that on-demand topology is too expensive on high-resolution geodata (such spatial queries are not feasible at scale) and that for topology to be practical, it needs additional context beyond merely the geometries alone (even assuming they are cleaned) due to vagueness and uncertainty principles; whereas precomputing metrically-refined topology with context (e.g., what threshold to use for an approximate topological relation between a forest and a lake) is not only feasible but also capable of producing more meaningful relations than strict topological relations (e.g., DE-9IM between spatial regions).
However, I’m not sure that you need a new vocabulary or namespace. The definition of :hasGeometry in the GeoSPARQL standard[1] (clause 8.3.1.1) is
It is an owl:ObjectProperty, but there is no requirement for the object to be a local blank-node. It can be a URI, as in your slide 18.
Thank you for being so observant! You are correct, it is already possible to use URIs (i.e., instead of blank nodes) with GeoSPARQL as described in the presentation; as you pointed out, this does not require use of a new predicate. In fact, the approach here is fully compatible with GeoSPARQL in theory, both in terms of the vocabulary in the data model and the extensible value testing functions in SPARQL (e.g., geof:intersection, geof:convexHull, etc.).
The reason we show a new predicate for hasGeometry is mostly to highlight to the viewer that we are proposing something new here, not that we intend to replace the GeoSPARQL ontology. In other words, these custom predicates and classes have only been used for demonstration purposes and proof-of-concepts so far. However, we could assume that such a predicate is an rdfs:subPropertyOf geosparql:hasGeometry, or that the ago:Geometry class is an rdfs:subClassOf geosparql:Geometry if it became necessary to add e.g., property restrictions.
A set of properties are provided, one of which is geo:hasSerialization. WKT is only provided as an example, and is not mandatory - GML is included in GeoSPARQL as another option, but other representations are not prohibited thanks to the RDF open-world-assumption.
Yes, I also spoke about GML during the presentation; and again I am not saying that anything about GeoSPARQL needs to be changed. I merely hope that the community sees the benefits in using dereferenceable IRIs instead of blank nodes for geometries (which we recommend for *all* non-point features), and that complex geometries pose many challenges when encoded as human-readable formats in RDF Literals.
You will likely also need to negotiate over the schematic form of the representation (e.g. neogeo vs geosparql).
Very interesting. I had not yet seen the draft for content negotiation by profile. The negotiation concept for geometries certainly requires more development -- there was also a comment on the call about negotiation and available representations of geometries. On a related note, I think there are several transactional aspects to consider when geometries are taken out of RDF Literals including how to deal with versioning, Linked Data Platform, Web Feature Service, and Web Processing Service.
- Blake
On Tue, May 28, 2019 at 4:09 PM Cox, Simon (L&W, Clayton) <Simon.Cox at csiro.au<mailto:Simon.Cox at csiro.au>> wrote:
Thanks Blake –
Sorry I missed your presentation. Somehow it had dropped out of my calendar.
I’ve looked through your slides. Your general approach (which I think is to persist geometry representations outside the context of the feature, and link to them through URIs) makes sense, and I think essentially matches practice in GIS systems for decades now (where geometry was in a separate table).
However, I’m not sure that you need a new vocabulary or namespace. The definition of :hasGeometry in the GeoSPARQL standard[1] (clause 8.3.1.1) is
geo:hasGeometry a rdf:Property,
owl:ObjectProperty;
rdfs:isDefinedBy <http://www.opengis.net/spec/geosparql/1.0>;
rdfs:label "has Geometry"@en;
rdfs:comment "A spatial representation for a given feature."@en<mailto:%22A%20spatial%20representation%20for%20a%20given%20feature.%22 at en>;
rdfs:domain geo:Feature;
rdfs:range geo:Geometry .
It is an owl:ObjectProperty, but there is no requirement for the object to be a local blank-node. It can be a URI, as in your slide 18.
You may worry about the rdfs:range, which is given as geo:Geometry, which is defined in clause 8.4. A set of properties are provided, one of which is geo:hasSerialization. WKT is only provided as an example, and is not mandatory - GML is included in GeoSPARQL as another option, but other representations are not prohibited thanks to the RDF open-world-assumption. There is merely the entailment that the object of a geo:hasGeometry property is a member of the class geo:Geometry.
As you note, content negotiation for representations of a geometry will be helpful.
However, format (serialization) negotiation using HTTP Accept: is only part of the story.
You will likely also need to negotiate over the schematic form of the representation (e.g. neogeo vs geosparql).
This is the topic of an upcoming W3C note from the Data Exchange Working Group ‘Content Negotiation by Profile’[2].
Simon
[1] https://portal.opengeospatial.org/files/?artifact_id=47664
[2] https://w3c.github.io/dxwg/conneg-by-ap/
From: esip-semanticweb [mailto:esip-semanticweb-bounces at lists.esipfed.org<mailto:esip-semanticweb-bounces at lists.esipfed.org>] On Behalf Of Blake Regalia via esip-semanticweb
Sent: Wednesday, 29 May, 2019 07:13
To: Mcgibbney, Lewis J (398M) <lewis.j.mcgibbney at jpl.nasa.gov<mailto:lewis.j.mcgibbney at jpl.nasa.gov>>
Cc: esip-semanticweb at lists.esipfed.org<mailto:esip-semanticweb at lists.esipfed.org>; Mike Daniels <daniels at ucar.edu<mailto:daniels at ucar.edu>>
Subject: Re: [esip-semantictech] [AGENDA] ESIP SemTech Telecon - 2019-05-28
Slides from today's presentation:
https://www.slideshare.net/BlakeRegalia/towards-a-more-efficient-paradigm-of-storing-and-querying-spatial-data-on-the-semantic-web
- Blake Regalia
On Thu, May 23, 2019 at 4:12 PM Mcgibbney, Lewis J (398M) <lewis.j.mcgibbney at jpl.nasa.gov<mailto:lewis.j.mcgibbney at jpl.nasa.gov>> wrote:
Hi esip-semanticweb,
This is a courtesy email regarding preparation for our next telecon.
We will be hosting Blake Regalia, UCSB who will be “…Revisiting the Representation of and Need for Raw Geometries on the Linked Data Web”.
After Blake’s presentation, we will use the remainder of our time to discuss the science-on-shcema.org<http://science-on-shcema.org> proposal which can be found at https://docs.google.com/document/d/1O539ROr9W7FUEDzR2ni2H2Doxx_2zK8AF-sma8pBDe0/edit?usp=sharing
Mike Daniels will be joining us for that.
Our meeting minites can be found at https://docs.google.com/document/d/19agZraGms4vsv7S2SP0SpPWuTtfIQOet4NkyvMUEClQ/edit#heading=h.yn5iw79j9hmd
SemTech Monthly Telecon
· 4th Tuesday of each month at 4pm Eastern
· GoToMeeting: https://www.gotomeeting.com/join/976796333
· Phone Access: United States: +1 (872) 240-3212
· Access Code: 976-796-333
Lewis
Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)
Data Scientist III
Computer Science for Data Intensive Applications Group (398M)
Instrument Software and Science Data Systems Section (398)
Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, California 91109-8099
Mail Stop : 158-256C
Tel: (+1) (818)-393-7402
Cell: (+1) (626)-487-3476
Fax: (+1) (818)-393-1190
Email: lewis.j.mcgibbney at jpl.nasa.gov<mailto:lewis.j.mcgibbney at jpl.nasa.gov>
ORCID: orcid.org/0000-0003-2185-928X<http://orcid.org/0000-0003-2185-928X>
[signature_492949258]
Dare Mighty Things
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-semanticweb/attachments/20190531/b6cede94/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3432 bytes
Desc: image001.png
URL: <http://lists.esipfed.org/pipermail/esip-semanticweb/attachments/20190531/b6cede94/attachment-0001.png>
More information about the esip-semanticweb
mailing list