Distribution service [RDISV] #56

jpullmann · 2018-01-18T21:11:48Z

Distribution service [RDISV]

Ability 1) to describe the type of distribution and 2) provide information about the type of service

Such a description may be provided through a suitable profile identifier that defines a profile of the relevant service type.

Related requirements: Profiles listing [RPFL] Distribution schema [RDIS]
Related use cases: DCAT Distribution to describe web services [ID6] Modeling service-based data access [ID18] Machine actionable link for a mapping client [ID21]

makxdekkers · 2018-01-19T19:27:14Z

This requirement is very similar to #55. Should they be combined?

dr-shorthair · 2018-02-22T02:20:27Z

Also depends on #52

dr-shorthair · 2018-03-15T23:44:58Z

@makxdekkers I don't think so. This issue is at a coarser level than #55. As @andrea-perego notes in https://www.w3.org/TR/dcat-ucr/#ID18 a set of subclasses of dcat:Distribution were originally proposed for DCAT, but did not make it into recommendation. This issue invites us to revisit that.

dr-shorthair · 2018-03-16T07:08:59Z

Proposal for a DataService class to help resolve this issue, and also #166
https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

andrea-perego · 2018-03-17T00:32:27Z

I think we need to take into account backward compatibility.

An alternative option is the one used in Geo/DCAT-AP, where the fact that the distribution points to a service is specified by soft typing the distribution (dcat:type) - see UC18.

dr-shorthair · 2018-03-18T09:43:03Z

@andrea-perego In your response to #166 you explain that you used dctype:Service in the past.

Are these service descriptions also included in a DCAT Catalog?
If so, then does this prove the point, that a class for Service is required as part of the core model?
Finally, should we just go with dctype:Service, or add a class to DCAT explicitly?

andrea-perego · 2018-03-24T00:57:28Z

@dr-shorthair , please see my replies inline:

Are these service descriptions also included in a DCAT Catalog?

Yes (at least, in GeoDCAT-AP).

If so, then does this prove the point, that a class for Service is required as part of the core model?

I would be in favour of that, and, more in general, of not limiting the content of a dcat:Catalog only to dcat:Dataset's - as explained in #166 (comment)

Finally, should we just go with dctype:Service, or add a class to DCAT explicitly?

I'm open to either options - as explained in #166 (comment)

fellahst · 2018-03-24T14:26:47Z

In the context of Geoplatform.gov, we have to deal with the modeling of services, whose health status are checked on a regular basis(https://statuschecker.fgdc.gov/).
For this purpose, we introduce a first-class business object Service. A Service conforms to a standard (using dct:conformsTo) such as WMS, WMTS, WFS. A service can operate (srim:operatesOn) on different assets (dcat:Dataset, Layer, Map,..) and assets are servicedBy (srim:servicedBy) by one or more services. A service may support multiple versions of protocols (srim:supportVersion).

Now, here the caveat. A service can provide multiple assets (for example multiple feature collections in WFS or layers in WMS). When you model the distribution of this asset that is served by this service, you cannot just refer to the Service (using srim:servicedBy), you need more information about "how" to access the service. You need the set of valid parameters for calling an operation of the service. For example, a Layer can be distributed from a WMS and its parameters needs the layer id. Idem for a WFS, you need to give a least the featureType identifier to refer to the collection. My initial thought was to introduce an additional class called DataSource which captures standard, parameter descriptions and bindings of parameter name to value. However, we found out that extending dcat:Distribution by adding parameters and bindings will simplify the model by avoiding introducing a new first class object.

We define the parameter concept in a different namespace because we believe it can be reused in many different contexts outside DCAT:

Here a summary of its properties:

Here an example in JSON-LD of a layer distributed from a WMTS: note that we are using a url template when you have parameters

{
    "_created": "2018-01-24T16:10:01.321+0000",
    "_modified": "2018-01-24T16:10:01.321+0000",
    "id": "c25e93d08c41187d557855fd8a49d472",
    "type": "Layer",
    "uri": "http://www.geoplatform.gov/id/layer/c25e93d08c41187d557855fd8a49d472",
    "label": "Layer",
    "title": "Layer",
    "distribution": [
        {
            "type": "dcat:Distribution",
            "accessURL": "http://.../tiles/0/{Style}/{TileMatrixSet}/{TileMatrix}/{TileRow}/{TileCol}.png",
            "parameters": [
                {
                    "name": "Style",
                    "type": "enum",
                    "values": [
                        "default"
                    ]
                },
                {
                    "name": "TileMatrixSet",
                    "type": "enum",
                    "values": [
                        "default",
                        "EPSG:3857",
                        "EPSG:4326"
                    ]
                },
                {
                    "name": "TileMatrix",
                    "type": "integer",
                    "min": 0
                },
                {
                    "name": "TileRow",
                    "type": "integer",
                    "min": 0
                },
                {
                    "name": "TileCol",
                    "type": "integer",
                    "min": 0
                }
            ]
        }
    ],

I hope this will help to move forward this difficult issue.

andrea-perego · 2018-03-25T00:46:10Z

Thanks, @fellahst , for sharing the Geoplatform.gov approach.

It seems that the solution you use is similar to the one developed in Geo/DCAT-AP (illustrated in UC18 & UC20), although there are some differences. A full example on how this is done in Geo/DCAT-AP is documented at #166 (comment) .

I think it would be nice to see if we can build on both approaches to provide a consolidated solution. I summarise below how this is done in Geo/DCAT-AP, focussing on the issues you mention.

About the similiarities:

Also in GeoDCAT-AP a service is a first-class citizen, and it is described by a specific metadata record registered in a catalogue.
We also use dct:conformsTo to specify the service / API "protocol"

Services are denoted as dctype:Service's, and the specific service type (discovery, view, etc.) is specified via dct:type (i.e., soft typed). The reference code list is the one operated by the INSPIRE Registry, which provides HTTP URIs for the relevant ISO 19115 code list.

The difference, is that the link to the resources (e.g., datasets) available from a service is specified by using dct:hasPart - as explained in #116 .

Coming now to the issue of modelling a distribution pointing to a service, this is done (a) by (soft) typing the distribution (dct:type) to specify that the data are available from a service and (b) by specifying the service protocol with dct:conformsTo (as done with the service). For a full example, see #166 (comment) .

As explained in UC18, we also had the issue you mention concerning the fact that a service may give access to more than one dataset, so the question is how to specify the relevant query parameters.

Our aim was to identify a solution able to describe the query interface and parameters for any services (not only geospatial ones) by using a standard mechanism / language, preferably widely supported. In particular, the decision was not to encode in RDF the query parameters, but to link to (or embed in RDF) a separate (XML, JSON, ...) document providing this information.

The preliminary proposal was to use an OpenSearch description document (OSDD). You can find a full description of the approach in the DCAT-AP issue tracker (DT2: Service-based data access).

This idea of using OpenSearch was further discussed (a bar camp session of the SDSVoc workshop was devoted to it), and, eventually, it was considered not fit for describing all services. So, the issue was put on hold.

Looking forward to your feedback.

dr-shorthair · 2018-03-25T23:35:08Z

@andrea-perego

I think we need to take into account backward compatibility

Do you mean

backward compatibility with DCAT 1.0 vocabulary or
backward compatibility with some DCAT installations?

Looking at the latter, I understand that in the GeoDCAT-AP implementations you have implemented a work-around to allow services to be catalogued, with

individuals of type dctype:Service in the dcat:Catalog
dct:type used for soft-typing according to the high-level INSPIRE classification of (discovery, view, download) services
dct:conformsTo used to indicate the service standard (e.g. WMS, WFS, SOS, ...)
dct:hasPart to point to the datasets served.

This looks like a clear demonstration of the requirements, but a sub-optimal solution because

contrary to what was done with dcat:Dataset, for services you use an old dctype class with some additional constraints added informally
the semantics of at least one of these properties (dct:hasPart) are not very precise

That is why I propose adding an explicit class for services to DCAT, so that we have a clean platform to achieve the outcome that you describe. There is a clear and easy correspondence with the GeoDCAT-AP solution:

Catalog and DataService are both subclasses of dctype:Service
dcat:dataset, dcat:dataService, and dcat-s:servesDataset can be sub-properties of dct:hasPart
the value of dct:type is fixed to discovery for dcat:Catalog, and to download for dcat:DataService

See #172 and https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

dr-shorthair · 2018-03-26T10:03:14Z

However, I'm not sure if there is a 'backward compatibility' explanation need with respect to DCAT 1.0.

Rather, the compatibility/migration documentation is more with respect to the additional requirements and solution introduced in GeoDCAT-AP?

fellahst · 2018-03-26T15:04:12Z

@andrea-perego Thank you so much for taking the time to compare the GeoDCAT-AP approach with the Geoplatform one. I think both approaches are very similar and probably hint that both approaches are on the right track (passing the Test of Independent Invention). I have a couple of comments and disagreement with the references you gave.

In UC20, you model catalog service as dcat:Catalog. This seems very weird to me. In my mind, a catalog is a collection of items, not a service. The DCAT specification defines dcat:Catalog as "a curated collection of metadata about datasets". A Catalog service provides an API that manages one or more catalogs, provides search, harvesting, indexing functions for example. Convoluting both notions (Catalog and CatalogService) will cause problems in modeling relationships between CatalogService and Catalogs.
An OGC Catalog Service (CSW) should be modeled as a dct:Service with dct:type Catalog and dct:conformsTo OGCCatalogSpec.

I think that using dct:Service to model service is fine. For some reason, I missed this term from DC Terms and I introduced the concept in SRIM namespace. I would agree that dct:Service is totally appropriate. I will probably update the spec to make the alignment.

To model relationships between Service and other items (Dataset, Layer, Map, etc..), we borrowed the terms used in ISO 19115 (operatesOn and servicedBy). However, we generalized the range of operatesOn to be srim:Item (which is superclass of dcat:Dataset), so we can refer to other types of assets such as Layer or Map. I strongly recommend we introduce a superclass of dcat:Dataset to accommodate this case (may be sioc:Item).

My last comment about the modeling of query parameters. Personally, I do not like the suggestion of using an external document such as OSDD, because it makes the implementation of the client more complex for the following reasons:

because it requires an out of band call to get this document
because it requires parsing the document in a different format than RDF.
Simple case such as setting a single parameter binding such as featureId or layerId value, become cumbersome to define for a publisher and require to manage different documents.

The RDF parameter model we used in Geoplatform, maps almost one to one OSDD model (we may need to add additional properties to fill the gap such as pattern). The benefits of this approach are:

we remain in-band and clients have access to all information to perform the request to the service
the model is semantic (this agnostic to implementation) and the information about accessURL (including url template), parameters, the type of service (dct:type) and standard (dct:conformsTo) should be sufficient for a client to know how to build the appropriate query to the service.
it is more or less aligned with the ISO 19119 standard. In ISO 19119, services are abstracted as a set of Operations with Parameters and DCP (Distributed Computing Platform) verbs. This abstraction should be sufficient to model any types of service implementation.

While we should try to accommodate the different type of service implementation (REST, SOAP, RPC), we should consider that RESTful APIs constitutes the majority of service APIs out there, so the spec should make it trivial and easy to access these services.

andrea-perego · 2018-03-26T16:00:07Z

@dr-shorthair asked:

I think we need to take into account backward compatibility

Do you mean

backward compatibility with DCAT 1.0 vocabulary or

backward compatibility with some DCAT installations?

My original comment was indeed related to DCAT 1.0, as it was about the idea of defining a subclass of dcat:Distribution - sorry for not having made this clear.

As I said elsewhere, I'm not a priori against the idea of defining a specific class for services, as first-class citizens (i.e., not as distributions). Of course, here there may be backward compatibility issues with Geo/DCAT-AP. But I think your proposal, @dr-shorthair , does address them.

I have nonetheless some comments on the proposal at Cataloguing data services. I'll post them to #172 .

andrea-perego · 2018-03-26T21:44:33Z

Thanks for your comments, @fellahst . Please see my replies inline.

[...] I think both approaches are very similar and probably hint that both approaches are on the right track (passing the Test of Independent Invention). [...]

I was thinking the same 😄

In UC20, you model catalog service as dcat:Catalog. This seems very weird to me. In my mind, a catalog is a collection of items, not a service. The DCAT specification defines dcat:Catalog as "a curated collection of metadata about datasets". A Catalog service provides an API that manages one or more catalogs, provides search, harvesting, indexing functions for example. Convoluting both notions (Catalog and CatalogService) will cause problems in modeling relationships between CatalogService and Catalogs.
An OGC Catalog Service (CSW) should be modeled as a dct:Service with dct:type Catalog and dct:conformsTo OGCCatalogSpec.

I agree that dcat:Catalog, in DCAT 1.0, makes no reference to the notion of service - and actually I made a comment along these lines in #172 (comment) , where I mention use cases I'm aware of where it is simply used as a way of grouping datasets based on some criteria (e.g., the datasets produced by a project/activity).

However, I'm also aware of use cases (leaving aside the GeoDCAT-AP approach) where dcat:Catalog is used to denote catalogue services (I used this term in its general sense: an endpoint from which you can get a set of metadata records).

Said that, using dcat:Catalog to cover both cases may be confusing, as you note. An option could be to (strong) type a catalogue service with both dcat:Catalog and dctype:Service:

a:CatalogService a dcat:Catalog , dctype:Service .

This could also be a possible solution to the issue I raised in #172 (comment) about making dcat:Catalog a subclass of dctype:Service.

To model relationships between Service and other items (Dataset, Layer, Map, etc..), we borrowed the terms used in ISO 19115 (operatesOn and servicedBy). However, we generalized the range of operatesOn to be srim:Item (which is superclass of dcat:Dataset), so we can refer to other types of assets such as Layer or Map. I strongly recommend we introduce a superclass of dcat:Dataset to accommodate this case (may be sioc:Item).

I think a class more general than dcat:Dataset would indeed be useful, also for catalogues. This links to the issue about having metadata in a catalogue about resources which are not datasets - see also #116 (comment).

On a different note, I would be interested to know if you considered defining a "layer" as a distribution of a dataset, and, in such a case, why you decided otherwise. Please note that I don't have any specific position on this issue - I'm just curious.

My last comment about the modeling of query parameters. Personally, I do not like the suggestion of using an external document such as OSDD, because it makes the implementation of the client more complex for the following reasons:

[snip]

I see your point, but an RDF-based approach may have other issues.

In Geo/DCAT-AP, the assumption was that the catalogue platform used might not necessarily be natively based on RDF. So, the rationale behind using a fit-for-purpose language (as OpenSearch) to specify the query parameters was based on the idea that such "language" would be more easily interpreted by software agents, compared to an ad hoc RDF representation.

Said that, we might have overestimated this problem, and therefore we could re-visit our resolution. I wonder whether you could share some details on how you've implemented your approach.

BTW, another reason for looking into OpenSearch was related to the work at OGC to extend it to support requirements of OGC services - see http://www.opengeospatial.org/standards/opensearchgeo and http://www.opengeospatial.org/standards/opensearch-eo

While we should try to accommodate the different type of service implementation (REST, SOAP, RPC), we should consider that RESTful APIs constitutes the majority of service APIs out there, so the spec should make it trivial and easy to access these services.

This could be indeed a possible way forward. My concern is whether defining how this should be done in DCAT could be out of scope of DXWG. If we opt for an RDF-based solution, this seems like something to be addressed by a specific vocabulary.

dr-shorthair · 2018-03-26T22:56:23Z

@fellahst wrote

For some reason, I missed this term from DC Terms

dctype:Service is from the Dublin Core types vocabulary, which is supplemental to the (properties only) DC Terms namespace .

jpullmann · 2018-04-18T17:33:44Z

Comment on behalf of Øystein Åsnes on "DCAT Distribution to describe web services [ID6]"

relates to ID18
important, because of increasing proliferation of APIs
so far ony hints on interpretation of dcat:accessURL via dcat:distribution dct:description
request for a common core vocabulary for describing (classifying) webservices (not only for geospacial domain)
link to machine readable service decriptions (e.g. WSDL), don't overload dct:conformsTo
make obvious and immediately discoverable (no need of prior interaction/URL reslution):
- kind of distribution (file, webservice, feed, landingpage)
- web service protocol (SOAP, REST)
- Quality of Service parameters ("under development", "deprecated")
- login/user registration keys etc. (see ID17)

dr-shorthair · 2018-04-19T22:21:33Z

Also see example from @andrea-perego #166 (comment)

dr-shorthair · 2018-04-29T12:58:17Z

Proposed RDF implementation in https://github.com/w3c/dxwg/blob/dcat-service-simon/dcat/rdf/dcat-service.ttl

dr-shorthair · 2018-07-09T06:25:19Z

Resolved with #241

jpullmann added dcat distribution requirement service labels Jan 18, 2018

dr-shorthair added the dcat:Distribution label Feb 1, 2018

davebrowning mentioned this issue Feb 7, 2018

Provide guidance in DCAT2 on how to extend Distribution #106

Closed

rob-metalinkage mentioned this issue Mar 1, 2018

dcat:accessURL - check constraints #124

Closed

dr-shorthair mentioned this issue Mar 15, 2018

Add property to link from Distribution -> Dataset (inverse of dcat:distribution) #166

Closed

dr-shorthair added this to the Data services milestone Mar 16, 2018

dr-shorthair mentioned this issue Mar 22, 2018

Clarify scope of DCAT - Datasets, Distributions, Services? #172

Closed

dr-shorthair mentioned this issue Apr 19, 2018

Proposal to generalise property dcat:dataset #116

Closed

stijngoedertier mentioned this issue Apr 24, 2018

Use Case: interoperability between metadata standards describing resources of various types #223

Closed

andrea-perego mentioned this issue May 18, 2018

Example of catalogued service #237

Closed

dr-shorthair mentioned this issue May 25, 2018

Extending DCAT for services #241

Merged

aisaac removed distribution labels May 29, 2018

jakubklimek mentioned this issue Jun 11, 2018

Umožnit katalogizovat služby pro přístup k datům datagov-cz/nkod#10

Closed

dr-shorthair closed this as completed Jul 9, 2018

dr-shorthair mentioned this issue Mar 10, 2019

Distribution vs DataDistributionService #809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution service [RDISV] #56

Distribution service [RDISV] #56

jpullmann commented Jan 18, 2018