-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition of dcat:spatialResolutionInMeters incompatible/problematic with JSON-LD #1536
Comments
That is not accurate. In JSON-LD, But it is true that I believe that xsd:decimal is superior to xsd:double in many respects (see this paper (pdf) for a detailed analysis), so I am in favour of option 2. |
Another option would be the following:
(really, I see no reason why the E notations were not included in the lexical space of xsd:decimal !...) Then change the range of all concerned dcat properties from xsd:decimal to dcat:decimal. As stated above, this is not a breaking change (xsd:decimal is semantically a subtype of dcat: decimal -- even syntactically, any valid xsd:decimal value can be "cast" to a dcat:dedimal without changing its meaning). This way, the literals produced by JSON-LD processor would be valid when the original value was any JSON number... @iherman any opinion on this? |
@pchampin that seems to work, playground. But I am not sure what exactly the requirements are for @jakubklimek. Is the 'E' notation all right, provided the datatype is also correct? The problem, of course, is whether there is any reasoner that is properly prepared for such datatype reasoning, it, for datatypes that go beyond xml schema. I have zero experience with that... The pro aspec is that there is nothing to do for the JSON-LD spec. Although... the playground does the right thing, but does the JSON-LD spec say that the conversion of JSON numbers should happen the way it happens, or is this only an implementation side effect? @gkellogg this question is really for you... |
Well, my requirement is for JSON-LD distributions of DCAT-AP to pass DCAT-AP SHACL validation, which validates the datatype of |
JSON numbers are problematic. JSON-LD will treat them as integers or doubles based entirely on the presence or absence of a fractional part. To maintain fidelity either XSD types, you should stick with string values. A future version may change the interpretation based on JCS, which is used for JSON Literals, but it remains fundamentally problematic. The problem with JSON numbers goes back to the over-simplified view in JavaScript. if data type fidelity is important, stick with value objects having a string value. |
@iherman in my experience, many RDF implementations (libraries, triple stores, inference engines) support xsd:decimal. So yes, replacing it with dcat:decimal might disrupt some systems... More problematic: normatively, OWL2 only supports a predefine set of datatypes (including xsd:decimal). Any other datatype is, by default, rejected by reasoners (although Hermit has an option to simply ignore them). @gkellogg I agree that sticking to strings (with value objects or type coercion) is the easy way out in JSON-LD. But can we really convince people to give up on number in JSON? This kinds of contradict our narrative that "JSON-LD can still be used as day-to-day JSON" if we add "... expect for this or that kind of JSON". |
@pchampin said:
Note that the use of decimal values, as opposed to integer or double, is really an RDF issue, which probably doesn't impact people wanting to make their existing JSON just work. We've long warned about the potential for native numbers to be misinterpreted, and to avoid built-in behavior for XSD datatypes. Generally, you can interpret arbitrary JSON as JSON-LD, but absent other data typing information, there is no other good way to handle native numbers. If we're talking about a hypothetical normative change, then sure, I think we can do better. However, the behavior in 1.0 and 1.1 is to not manipulate native values, except as defined in the To- and From-RDF algorithms. Specifically, we were wary about adding any data-type specific behavior. In a (hypothetical) future update, which may need to be a 1.2 release due to the impact, I could see making the following changes:
However, as these are prospective normative changes that would affect the expected behavior of already compliant processors, there are some process steps we'd need to go through to send JSON-LD API back through CR, so I don't see how it can really affect this issue. |
The more I think about this issue, the more I believe that the problem is the definition of I have created a repo describing the issue, and the current state of implementations: https://github.com/pchampin/xsd_decimal/ I thought I would share this with the semantic web mailing list to get a sense of the community's opinion about this. @iherman, @gkellogg, what do you think? To get back to your issue, @jakubklimek, my conclusion is that you should be using strings to be on the safe side, but in many cases you may still use numbers and not run into any problem, because many implementations will recognized these non-standard xsd:decimals produced by JSON-LD... |
Great discussion folks - thanks @pchampin for the research and evidence. +1 for @jakubklimek intent (use case) here - "semantic uplift" of typical JSON serialisations seems to be the way in which any component in a system can augment a JSON payload to provide information about its meaning - there is no reason an intermediary or client cant be aware of this context and augment the information it gets from a service - in fact that the way the whole WoT is predicated to deal with low-power networking protocols from sensors. this is exactly what we wish to do in the OGC to formalise GeoDCAT as a profile of DCAT, with a normative context to make it JSON-LD compatible and allow GeoSPARQL to provide richer spatial semantics. (@jakubklimek can look into publication and reuse of a common DCAT context to be referenced by any DCAT profiles?) If the problem is in a fundamental in the xsd:decimal lexical rules (not semantics) - then layering in workarounds - such as requiring "non-natural" string based serialisations will require special code support on every server and client component. IMHO it would be better to aim for the simplest and most interoperable option for the greatest number of users over the longer term - which it would appear to be either update serialisers to not use use E notation - or update client libraries to support it. Perhaps lowest total effort to a best outcome is to make parsers "future compatible" with a proposed updated to xsd:decimal, and declare this as a formal profile for now so components have a mechanism to at least be transparent at run-time. This will not break any existing systems, but would allow new systems to be built that do not impose unreasonable burdens on clients in future. Note that Apache Jena supports GeoSPARQL (https://jena.apache.org/documentation/geosparql/index.html) - which is just being updated to a 1.1, perhaps it would not be too big a reach to ensure this update is also factored in. @nicholascar might have some further insight into this. |
Well, I do not have any contact with the main developers of XSD schemas any more, so we can only guess why they created this datatype in the first place. I would think that they thought authors should use One could also say that RDF may have been lazy by simply adopting XSD as the bases for datatypes instead of adopting something possibly simpler (how many people are there around who read the XSD specification with all its intricacies and details?). I suspect that may also be water under the bridge...
Which reinforces what I said: even developers did not read (or possibly did but wilfully ignore) the XSD spec... 😀
I am not sure if it is worth if the question is how JSON-LD would have to map to RDF w.r.t numbers. There may be a (much) longer discussion on the whole area of datatypes for RDF, and whether, after 20 years, the choice of XSD was indeed a judicious one and whether a major simplification in that area would be worthwhile. But that discussion makes only sense if it leads to some consistent datatype specification that future RDF data can universally use; otherwise it will lead to a purely academic discussion... (A good thing is that any change does not necessarily require the a change in the RDF standard itself, although the RDF spec lists XSD explicitly. Nothing forbids to propose, and widely adopt, an alternative datatype system.) |
https://www.w3.org/TR/xsd-precisionDecimal/ (via https://www.w3.org/TR/xmlschema11-2/#primitive-vs-derived) It is not a derived type of xsd:decimal. Systems maybe implementing decimals as doubles. Changing numbers would also mean defining arithmetic. |
One could also say that it refrained from reinventing the wheel :-) @afs, wow, I didn't know about that one. Thanks for the reference. However, I am not sure this really solves the issue here:
|
Sorry it does not help. But why change
|
@afs clearly, it is possible to convert between We could fix this by either changing JSON-LD, or changing |
but let's re-focus the discussion on the issue raised by @jakubklimek, and in the way we can address it here, that is in the context of DCAT-3. (I suggest the more general debate on @riccardoAlbertoni @davebrowning @dr-shorthair Unless I missed something, the DCAT spec does not at all talk about how it can be serialized in JSON-LD, right? So I am not entirely sure where the warning text should go... |
SPARQL has nothing to do with this - your survey needs to test parsers and the handling of such data end-to-end. Equality of constants is only part of the picture. Try Your survey needs to cover RDF/XML. A system that uses an existing validating XML parser needs changing as well. SPARQL was not my point. It is a legal cast in F&O - a non-RDF standard with a significant adoption.
It's a SHACL change. |
@afs thanks again for these very relevant remarks. Would you mind raising them as issues on https://github.com/pchampin/xsd_decimal where we could discuss them in length? As I wrote above, I think focus this thread on how to address this in the short term, for DCAT (or its profiles). |
The documentation of such a context would be, IMO, the right place to explain this caveat and the possible workarounds.
Using strings for conveying decimal values is not as unreasonable as it sounds, because JSON numbers can be lossy, as raised by @gkellogg above. |
Maybe the most pragmatic way forward would be to extend A drawback is that such a range can not be expressed in RDFS (but it can be expressed in OWL). I don't think this is too much of a problem. Note that JSON-LD users would still have the possibility to use either types (being more explicit for |
Even aside from any disruption a re-definition of xsd:decimal may cause, allowing xsd:decimal to contain an exponent may create a different problem. I think it is important to be able to syntactically distinguish an xsd:double from an xsd:decimal. Currently, the presence of an exponent is what signals an xsd:double: 1.23E1 must be an xsd:double. If an exponent were allowed in an xsd:decimal, then it isn't clear how an xsd:double could be syntactically distinguished: 1.23E1 would (presumably) conform to both datatypes. Possibly one could use the number of digits in the mantissa to distinguish them, because xsd:decimal is required to support 18 digits, whereas xsd:double supports a mantissa up to 2^53 = 9,007,199,254,740,992, which is 16 digits. But if E notation is added to xsd:decimal then the digits in the exponent would probably come out of that same 18-digit budget, so you'd again be faced with not being able to syntactically distinguish them. Furthermore, it would probably be confusing to have subtly different limits on the number of digits permitted in the mantissa and/or exponent between xsd:decimal and xsd:double. |
P.S. My guess is that this ability to syntactically distinguish an xsd:double from an xsd:decimal is the reason why an exponent is not allowed in an xsd:decimal in the existing standard. |
If we take @pchampin idea of defining range to be xsd:double or xsd:decimal then implementations can choose which they need - and a note saying E notation should not be used in xsd:decimal to be strictly compliant makes sense. Either that or make just a xsd:double - there is not enough information supported by other DCAT property semantics for the precision to matter much I suspect: spatial resolution is a complex thing depending on all sorts of map project issues - and the subtle differences with number precision would be lost in, for example, the issues of plate tectonic shift (some jurisdictions use dynamic datums with temporal epochs). "Effective spatial resolution" would need to be calculated anyway depending on map projection of the data, view angles of sensors and all sorts of things, and maybe an approximation to start with. So the real solution for precision would be to fix this in GeoDCAT - allowing core DCAT to provide "general statements" and GeoDCAT be rich enough to have all the other information you would need to make the fine distinction. Syntactic interoperability matters most for core DCAT as the semantics probably dont allow more precise information regardless of syntactic precision preservation - so xsd:double might be easiest? also SHACL constraints are probably going to be more practical use than OWL or RDFS - AFIACT very little OWL reasoning is done dynamically in the sort of environments concerned with data cataloging - DCAT doestn't really support rich enough description to support much meaningful reasoning - you would need to attach other data with relevant information models for most interesting cases anyway. |
Note that the union definition of xsd types is already present in DCAT: all temporal data properties allow any 'temporal' xsd type. |
@jakubklimek thanks for highlighting this issue. I think also that we should consider what is the objective here: namely DCAT. The question we have to raise is what one wants to do with this property. I might be mistaken, but in so far I have not yet seen "calculations" happening with the DCAT descriptions. If the use of this property is to print on a webpage the value, then xsd:decimal is really fine. But then unfortunately JSON(-LD) users cannot use native json numeric types. Since this issue is at the core of bridging JSON to RDF, the issue is not limited to DCAT but to the use numeric values in all w3c vocabularies. I would like to see a solution that is workable for all rather then a specific approach in each vocabulary. Observe that in some cases the representation "3.45" is the expected representation (e.g. monetary values), while in others (e.g. the quantity of CO2 in the air) the double representation is the expected. For that reason I believe that the topic is beyond DCAT as such. |
@rob-metalinkage I see your point about precision and Re. OWL vs. SHACL: SHACL has no problem expressing that a property accepts multiple datatypes. |
@dbooth-boston I think your argument about distinguishing between double and decimal, as a motivation for forbidding the E notation in xsd:decimal, is very plausible. And it makes me realize that the problem raised by @jakubklimek , in JSON-LD, may also occur in Turtle! More precisely, with the currently, the following snippet is compliant I think this is one more argument for allowing both datatypes for this property. |
@pchampin indeed, [] dcat:sparialResolutionInMeters 1e2. is currently invalid, as it indicates the [] dcat:sparialResolutionInMeters 100.0 The actual problem in JSON-LD is that it is currently not solvable with JSON decimal numbers used, as those always get transformed to the |
One could argue that in JSON-LD, this is also solvable by simply using the correct datatype, i.e. a JSON string instead of a number (and leave it to the context to coerce that string into I agree that using a string to express a numeric value is counterintuitive, but IMO so is the subtle distinction that Turtle makes between I also agree that, in an ideal world, JSON-LD would have better support for |
PR #1543 has mediated the different views and provides a non normative indication about the issue and possible solutions. |
There is a technical issue with the definition of
dcat:spatialResolutionInMeters
when used with JSON-LD.Specifically, the issue is the range being
xsd:decimal
.In JSON-LD,
xsd:decimal
is not supported for numbers, see the note in the specification. Therefore, when in JSON this number is actually a JSON number, not JSON string, even if the JSON-LD context specifies the datatype to be explicitlyxsd:decimal
, the number is treated asxsd:double
, leading to conversions like1.2E1
instead of12
when loading JSON-LD as RDF, in turn leading to invalidxsd:decimal
RDF literal"1.2E1"^^xsd:decimal
. This can be seen e.g. in JSON-LD playground.Possible solutions:
xsd:integer
orxsd:double
, which is a breaking changeThe text was updated successfully, but these errors were encountered: