Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add property to link from Distribution -> Dataset (inverse of dcat:distribution) #166

Closed
dr-shorthair opened this issue Mar 15, 2018 · 14 comments

Comments

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 15, 2018

Separating out this question (which is distinct from axiomatization of dcat:distribution #120)
DIscussion from #120 pasted below:

@dr-shorthair commented 22 days ago
Do we need an inverse property here - to allow a distro to point to the dataset that it purports to instantiate?

@dr-shorthair commented 23 hours ago
In this case I think an inverse property makes sense.
A Distro will usually know which Dataset that it is related to.

@akuckartz commented 23 hours ago
No inverse properties please. They are not necessary. But that probably should better be discussed in a separate issue.

@dr-shorthair commented 22 hours ago
@akuckartz : so if I have a description of a distribution (dcat:Distribution) in front of me, how do I know which dataset(s) it is a distribution of? Do I have to make a separate query to the catalogue?

@akuckartz commented 15 hours ago
Some information about inverse properties:
https://www.w3.org/wiki/WebSchemas/InverseProperties

@dr-shorthair commented 14 hours ago
@akuckartz - is this ref meant to answer my question above ?#120 (comment) The page appears to be largely about syntax in different platforms.

My point is that
(i) there is a reasonable requirement to be able to traverse from a distro to its dataset, and
(ii) the normal solution for common requirements is a specific property or property path.

Yes, the information might be obtained using additional service calls/queries, but this may not be the optimal solution.

@akuckartz commented 13 hours ago
@dr-shorthair That page contains the result of older discussions about the need for inverse properties. It states at the beginning:

that this should be addressed at the level of syntaxes for Web data, and not by defining inverse properties in the vocabularies

@dr-shorthair commented 13 hours ago
Thanks for clarifying.

I certainly understand that not all inverses are needed, and that they should only be introduced and named where there is a requirement.

However, that is really just a special case of noting that not all potential relationships should be blessed by being given their own name in a vocabulary: whether you call it ontology engineering, information modeling or data modeling, the art is in selecting which edges from the very large set of candidates in a graph we think are important. We do this for efficiency, not because there is no other path to travel that joins the same two resources. Sometimes that will include inverses. But I do not accept a blanket rule to prohibit them - people will encounter different artefacts from different access routes, and it is best if they can see all the relationships that matter (!) without running an additional query.

In this case, there is a whole lot of descriptive information that is more appropriately attached to the Dataset description than the Distribution. I'd prefer to just navigate over to it than have to reason my way over.

@nicholascar commented 13 hours ago
PROV defines some inverse properties (generated/wasGeneratedBy) but not all (used, no wasUsedBy) due to anticipated use. We should do the same.

Not enough people/systems are capable of reasoning or distributed graph assembly to just always declare only the semantic minimum.

@akuckartz commented 3 hours ago
Again: I think that a separate issue should be opened to propose an inverse property.

@fellahst
Copy link

I don't think we need to define an inverse property for dcat:distribution. When using a registry/catalog of datasets, the primary use case is to search datasets of interests. Once the dataset of interest is found, a user wants to know what distributions are available for a given dataset. I don't see a real-world use case where a user would start to search for Distribution and then figure out what dataset it belongs to.

@andrea-perego
Copy link
Contributor

I also don't see a use case requiring the inverse of dcat:distribution,

Based on the DCAT implementations I am aware of, my personal understanding is that dcat:Distribution is not considered as an independent entity, and in many cases it is denoted just as blank node. Which implicitly means that two dcat:Distribution's, although intensionally identical (i.e., because they have the same title, access URL, etc.), are nonetheless considered as two different distributions.

This seems possible, looking at the actual semantics of dcat:Distribution. E.g., you may have two different dcat:Dataset's having dcat:Distribution's with the same access URL - e.g., because they are pointing to the same download page, or to a service API (SPARQL endpoint, WMS, WFS, etc.). They are not the same distribution. The fact that the access URL is identical does not make them the same distribution.

I'm probably going too far, but this use of dcat:Distribution drove me to consider whether a dcat:Distribution could actually be interpreted as a reification of the relationship between a dcat:Dataset and the actual data (and not a description of the data themselves).

@dr-shorthair
Copy link
Contributor Author

dr-shorthair commented Mar 15, 2018

I was thinking in particular of https://www.w3.org/TR/dcat-ucr/#ID6 - the use of dcat:Distribution to describe data services. In this case the service is an independent thing with (usually) a need to link back to the descriptions of the datasets that it presents. Also Modeling service-based data access https://www.w3.org/TR/dcat-ucr/#ID18

Though perhaps we should first decide if services are distributions. Or if distributions are ever first-class resources - see #52 and also #56

@andrea-perego
Copy link
Contributor

I was thinking in particular of https://www.w3.org/TR/dcat-ucr/#ID6 - the use of dcat:Distribution to describe data services. In this case the service is an independent thing with (usually) a need to link back to the descriptions of the datasets that it presents. Also Modeling service-based data access https://www.w3.org/TR/dcat-ucr/#ID18

Thanks for the clarification, @dr-shorthair . However, I'm not sure that the service is the distribution. Rather, the service is linked from the distribution by using dcat:accessURL - which could be expressed as "the distribution of this dataset is available from [a service by using] this URL". If this is the case, dcat:dataset (or its inverse) is not modelling this relationship.

Actually, the relationship between a service and the data accessible from it looks similar to the role of the ISO 19139 element srv:operatesOn (and to the INSPIRE notion of "coupled resource"). Notably, in ISO 19139 and INSPIRE, the reference points from the service (metadata) to the dataset and not to its distribution.

How to model this relationship was one of issues discussed in the development of GeoDCAT-AP. The adopted solution was following from the decision of considering dcat:Catalog as a type of service (corresponding to a geospatial catalogue service, as a CSW). dcat:dataset was then the proposed property for linking any type of data service with a dataset. However, as the domain of dcat:dataset is restricted to dcat:Catalog, the decision was to use instead dct:hasPart (as dcat:dataset is a sub-property of it).

This could be actually a use case in favour of relaxing the domain restriction of dcat:dataset (I'll post a comment in the relevant issue - #117).

@dr-shorthair dr-shorthair added this to the Data services milestone Mar 16, 2018
@dr-shorthair
Copy link
Contributor Author

Thanks @andrea-perego . Can you confirm if WMS and WFS end-points were also described? Which DCAT class was used?

@dr-shorthair
Copy link
Contributor Author

dr-shorthair commented Mar 16, 2018

Proposal for a DataService class to help resolve this issue, also #56
https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

@larsgsvensson
Copy link
Contributor

Based on the DCAT implementations I am aware of, my personal understanding is that dcat:Distribution is not considered as an independent entity, and in many cases it is denoted just as blank node. Which implicitly means that two dcat:Distribution's, although intensionally identical (i.e., because they have the same title, access URL, etc.), are nonetheless considered as two different distributions.

To me this sounds like an argument to make dcat:accessURL (or at least dcat:downloadURL an owl:inverseFunctionalProperty since that would allow us to conclude that if two distributions have the same accessURL, they are the same distribution.

@andrea-perego
Copy link
Contributor

@larsgsvensson said:

To me this sounds like an argument to make dcat:accessURL (or at least dcat:downloadURL an owl:inverseFunctionalProperty since that would allow us to conclude that if two distributions have the same accessURL, they are the same distribution.

Ops, I was actually trying to make an argument against that.

dcat:accessURL is pointing to a "place" from which you can download the data, but from the same place you can download other data. The "place" is the same, not the distribution.

It is true that the situation may be different for dcat:downloadURL, as this property is supposed to provide a URL which gives you direct access to the data. However, this is not necessarily the case - the dcat:downloadURL can point to a compressed archive containing more that one dataset. Again, it is the "place" dcat:downloadURL is pointing to that is the same, not the distribution.

@andrea-perego
Copy link
Contributor

@dr-shorthair said:

Thanks @andrea-perego . Can you confirm if WMS and WFS end-points were also described? Which DCAT class was used?

The short answer is yes, and the class is dctype:Service, plus a couple of properties to specify the service type (dct:type) and protocol (dct:conformsTo).

The long answer:

GeoDCAT-AP basically follows the ISO 19115/19119 approach, i.e., a service is a first-class resource documented by a metadata record registered in a catalogue. As such it stands at the same level of dcat:Dataset in DCAT, as I explained in #64 (comment) . Hence, my proposal in #116 to have a more general property than dcat:dataset to link a dcat:Catalog to the described resources (in this case, a service).

As such, a service is not described as a dcat:Distribution: rather, the dcat:Distribution links to the service via dcat:accessURL. The reason is the one I explained earlier: the distribution is about the data that can be accessed via the service, not the service itself.

About the details on how a service and the distribution pointing to it is described in GeoDCAT-AP, I think we can refer to what said in UC20 and UC18, respectively. However, I include below a revised and extended version of the example in UC18, to better illustrate the catalogue-service-distribution relationship:

a:Catalog a dcat:Catalog ;
  dcat:dataset a:Dataset ;
# The relationship between the catalogue and the documented service 
# is specified with dct:hasPart as super-property of dcat:dataset
  dct:hasPart a:Service .

a:Dataset a dcat:Dataset; 
  dcat:distribution [ a dcat:Distribution ;
    dct:title "Data from a WMS"@en ;
    dct:description "This data is available from a Web Map Service (WMS)"@en ;
    dct:license <https://creativecommons.org/licenses/by/4.0/> ;
# The URL of the service API endpoint
    dcat:accessURL <http://some.site/service/wms> ;
# The distribution points to a service
    dct:type <http://publications.europa.eu/resource/authority/distribution-type/WEB_SERVICE> ;
# The service conforms to the WMS specification
    dct:conformsTo <http://www.opengis.net/def/serviceType/ogc/wms> ] .

a:Service a dctype:Service ;
  dct:title "WMS"@en ;
  dct:description "Web Map Service (WMS)"@en ;
  dct:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
# The URL of the service API endpoint
  foaf:homepage <http://some.site/service/wms> ;
# The service is a view service
  dct:type <http://inspire.ec.europa.eu/metadata-codelist/SpatialDataServiceType/view> ;
# The service conforms to the WMS specification
  dct:conformsTo <http://www.opengis.net/def/serviceType/ogc/wms> .

@dr-shorthair
Copy link
Contributor Author

OK - so is dctype:Service enough? Or do we need a model in DCAT. If you want to make dct:type and dct:conformsTo core, do we want to make a new class? or just go with SHACL after the fact?

@larsgsvensson
Copy link
Contributor

@andrea-perego scripsit:

It is true that the situation may be different for dcat:downloadURL, as this property is supposed to provide a URL which gives you direct access to the data. However, this is not necessarily the case - the dcat:downloadURL can point to a compressed archive containing more that one dataset. Again, it is the "place" dcat:downloadURL is pointing to that is the same, not the distribution.

Yes, I was referring mainly to dcat:downloadURL and thought that there would be a 1:1 correspondance between distribution and download. Thanks for the clarification!

@andrea-perego
Copy link
Contributor

@dr-shorthair said:

OK - so is dctype:Service enough? Or do we need a model in DCAT. If you want to make dct:type and dct:conformsTo core, do we want to make a new class? or just go with SHACL after the fact?

IMO, dctype:Service + dct:type + dct:conformsTo is the most satisfactory solution currently implemented, not requiring changes in DCAT. But I think it is worth considering also the possibility of having more specific classes than dctype:Service and dcat:Distribution. In such a case, it may be worth provide guidance on how to ensure backward compatibility.

@dr-shorthair
Copy link
Contributor Author

Agreed. See #56 (comment) #172 and https://github.com/w3c/dxwg/wiki/Cataloguing-data-services for a proposal to add a new class to DCAT for Data-Services.

@dr-shorthair
Copy link
Contributor Author

Clearly there is some work to be done to understand which links are needed between Datasets, Distributions and Services, but I'm happy to close this issue now and continue the discussion in #180 #181 #182 and in particular #56.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants