Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution service [RDISV] #56

Closed
jpullmann opened this issue Jan 18, 2018 · 19 comments
Closed

Distribution service [RDISV] #56

jpullmann opened this issue Jan 18, 2018 · 19 comments

Comments

@jpullmann
Copy link

Distribution service [RDISV]

Ability 1) to describe the type of distribution and 2) provide information about the type of service

Such a description may be provided through a suitable profile identifier that defines a profile of the relevant service type.


Related requirements: Profiles listing [RPFL] Distribution schema [RDIS] 
Related use cases: DCAT Distribution to describe web services [ID6] Modeling service-based data access [ID18] Machine actionable link for a mapping client [ID21] 
@makxdekkers
Copy link
Contributor

This requirement is very similar to #55. Should they be combined?

@dr-shorthair
Copy link
Contributor

Also depends on #52

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 15, 2018

@makxdekkers I don't think so. This issue is at a coarser level than #55. As @andrea-perego notes in https://www.w3.org/TR/dcat-ucr/#ID18 a set of subclasses of dcat:Distribution were originally proposed for DCAT, but did not make it into recommendation. This issue invites us to revisit that.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 16, 2018

Proposal for a DataService class to help resolve this issue, and also #166
https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

@andrea-perego
Copy link
Contributor

I think we need to take into account backward compatibility.

An alternative option is the one used in Geo/DCAT-AP, where the fact that the distribution points to a service is specified by soft typing the distribution (dcat:type) - see UC18.

@dr-shorthair
Copy link
Contributor

@andrea-perego In your response to #166 you explain that you used dctype:Service in the past.

  1. Are these service descriptions also included in a DCAT Catalog?
  2. If so, then does this prove the point, that a class for Service is required as part of the core model?
  3. Finally, should we just go with dctype:Service, or add a class to DCAT explicitly?

@andrea-perego
Copy link
Contributor

@dr-shorthair , please see my replies inline:

  1. Are these service descriptions also included in a DCAT Catalog?

Yes (at least, in GeoDCAT-AP).

  1. If so, then does this prove the point, that a class for Service is required as part of the core model?

I would be in favour of that, and, more in general, of not limiting the content of a dcat:Catalog only to dcat:Dataset's - as explained in #166 (comment)

  1. Finally, should we just go with dctype:Service, or add a class to DCAT explicitly?

I'm open to either options - as explained in #166 (comment)

@fellahst
Copy link

fellahst commented Mar 24, 2018

In the context of Geoplatform.gov, we have to deal with the modeling of services, whose health status are checked on a regular basis(https://statuschecker.fgdc.gov/).
For this purpose, we introduce a first-class business object Service. A Service conforms to a standard (using dct:conformsTo) such as WMS, WMTS, WFS. A service can operate (srim:operatesOn) on different assets (dcat:Dataset, Layer, Map,..) and assets are servicedBy (srim:servicedBy) by one or more services. A service may support multiple versions of protocols (srim:supportVersion).

Now, here the caveat. A service can provide multiple assets (for example multiple feature collections in WFS or layers in WMS). When you model the distribution of this asset that is served by this service, you cannot just refer to the Service (using srim:servicedBy), you need more information about "how" to access the service. You need the set of valid parameters for calling an operation of the service. For example, a Layer can be distributed from a WMS and its parameters needs the layer id. Idem for a WFS, you need to give a least the featureType identifier to refer to the collection. My initial thought was to introduce an additional class called DataSource which captures standard, parameter descriptions and bindings of parameter name to value. However, we found out that extending dcat:Distribution by adding parameters and bindings will simplify the model by avoiding introducing a new first class object.

We define the parameter concept in a different namespace because we believe it can be reused in many different contexts outside DCAT:

Here a summary of its properties:

JSON | RDFProperty | Range | Description | Obligation | Card.
name | prm:name | xsd:string | Name of the parameter | M | 1
label | rdfs:label | xsd:string | Display label of the parameter | R | 0..1
optional | prm:optional | xsd:boolean | Boolean indicating if the parameter is optional or not. | R | 0..1
description | dct:description | xsd:string | Description of the parameter | R | 0..1
type | prm:valueType | xsd:string | value type of the parameter | M | 1
default | prm:default | rdfs:Literal | default value of the parameter if applicable. | O | 0..1
min | prm:min | rdfs:Literal | minimum value of the parameter if applicable. | O | 0..1
max | prm:max | rdfs:Literal | maximumum value of the parameter if applicable. | O | 0..1
values | prm:validValue | rdfs:Literal | valid values | O | 0..n

Here an example in JSON-LD of a layer distributed from a WMTS: note that we are using a url template when you have parameters

{
    "_created": "2018-01-24T16:10:01.321+0000",
    "_modified": "2018-01-24T16:10:01.321+0000",
    "id": "c25e93d08c41187d557855fd8a49d472",
    "type": "Layer",
    "uri": "http://www.geoplatform.gov/id/layer/c25e93d08c41187d557855fd8a49d472",
    "label": "Layer",
    "title": "Layer",
    "distribution": [
        {
            "type": "dcat:Distribution",
            "accessURL": "http://.../tiles/0/{Style}/{TileMatrixSet}/{TileMatrix}/{TileRow}/{TileCol}.png",
            "parameters": [
                {
                    "name": "Style",
                    "type": "enum",
                    "values": [
                        "default"
                    ]
                },
                {
                    "name": "TileMatrixSet",
                    "type": "enum",
                    "values": [
                        "default",
                        "EPSG:3857",
                        "EPSG:4326"
                    ]
                },
                {
                    "name": "TileMatrix",
                    "type": "integer",
                    "min": 0
                },
                {
                    "name": "TileRow",
                    "type": "integer",
                    "min": 0
                },
                {
                    "name": "TileCol",
                    "type": "integer",
                    "min": 0
                }
            ]
        }
    ],

I hope this will help to move forward this difficult issue.

@andrea-perego
Copy link
Contributor

andrea-perego commented Mar 25, 2018

Thanks, @fellahst , for sharing the Geoplatform.gov approach.

It seems that the solution you use is similar to the one developed in Geo/DCAT-AP (illustrated in UC18 & UC20), although there are some differences. A full example on how this is done in Geo/DCAT-AP is documented at #166 (comment) .

I think it would be nice to see if we can build on both approaches to provide a consolidated solution. I summarise below how this is done in Geo/DCAT-AP, focussing on the issues you mention.

About the similiarities:

  • Also in GeoDCAT-AP a service is a first-class citizen, and it is described by a specific metadata record registered in a catalogue.
  • We also use dct:conformsTo to specify the service / API "protocol"

Services are denoted as dctype:Service's, and the specific service type (discovery, view, etc.) is specified via dct:type (i.e., soft typed). The reference code list is the one operated by the INSPIRE Registry, which provides HTTP URIs for the relevant ISO 19115 code list.

The difference, is that the link to the resources (e.g., datasets) available from a service is specified by using dct:hasPart - as explained in #116 .

Coming now to the issue of modelling a distribution pointing to a service, this is done (a) by (soft) typing the distribution (dct:type) to specify that the data are available from a service and (b) by specifying the service protocol with dct:conformsTo (as done with the service). For a full example, see #166 (comment) .

As explained in UC18, we also had the issue you mention concerning the fact that a service may give access to more than one dataset, so the question is how to specify the relevant query parameters.

Our aim was to identify a solution able to describe the query interface and parameters for any services (not only geospatial ones) by using a standard mechanism / language, preferably widely supported. In particular, the decision was not to encode in RDF the query parameters, but to link to (or embed in RDF) a separate (XML, JSON, ...) document providing this information.

The preliminary proposal was to use an OpenSearch description document (OSDD). You can find a full description of the approach in the DCAT-AP issue tracker (DT2: Service-based data access).

This idea of using OpenSearch was further discussed (a bar camp session of the SDSVoc workshop was devoted to it), and, eventually, it was considered not fit for describing all services. So, the issue was put on hold.

Looking forward to your feedback.

@dr-shorthair
Copy link
Contributor

@andrea-perego

I think we need to take into account backward compatibility

Do you mean

  1. backward compatibility with DCAT 1.0 vocabulary or
  2. backward compatibility with some DCAT installations?

Looking at the latter, I understand that in the GeoDCAT-AP implementations you have implemented a work-around to allow services to be catalogued, with

  • individuals of type dctype:Service in the dcat:Catalog
  • dct:type used for soft-typing according to the high-level INSPIRE classification of (discovery, view, download) services
  • dct:conformsTo used to indicate the service standard (e.g. WMS, WFS, SOS, ...)
  • dct:hasPart to point to the datasets served.

This looks like a clear demonstration of the requirements, but a sub-optimal solution because

  1. contrary to what was done with dcat:Dataset, for services you use an old dctype class with some additional constraints added informally
  2. the semantics of at least one of these properties (dct:hasPart) are not very precise

That is why I propose adding an explicit class for services to DCAT, so that we have a clean platform to achieve the outcome that you describe. There is a clear and easy correspondence with the GeoDCAT-AP solution:

  • Catalog and DataService are both subclasses of dctype:Service
  • dcat:dataset, dcat:dataService, and dcat-s:servesDataset can be sub-properties of dct:hasPart
  • the value of dct:type is fixed to discovery for dcat:Catalog, and to download for dcat:DataService

See #172 and https://github.com/w3c/dxwg/wiki/Cataloguing-data-services

@dr-shorthair
Copy link
Contributor

However, I'm not sure if there is a 'backward compatibility' explanation need with respect to DCAT 1.0.

Rather, the compatibility/migration documentation is more with respect to the additional requirements and solution introduced in GeoDCAT-AP?

@fellahst
Copy link

fellahst commented Mar 26, 2018

@andrea-perego Thank you so much for taking the time to compare the GeoDCAT-AP approach with the Geoplatform one. I think both approaches are very similar and probably hint that both approaches are on the right track (passing the Test of Independent Invention). I have a couple of comments and disagreement with the references you gave.

In UC20, you model catalog service as dcat:Catalog. This seems very weird to me. In my mind, a catalog is a collection of items, not a service. The DCAT specification defines dcat:Catalog as "a curated collection of metadata about datasets". A Catalog service provides an API that manages one or more catalogs, provides search, harvesting, indexing functions for example. Convoluting both notions (Catalog and CatalogService) will cause problems in modeling relationships between CatalogService and Catalogs.
An OGC Catalog Service (CSW) should be modeled as a dct:Service with dct:type Catalog and dct:conformsTo OGCCatalogSpec.

I think that using dct:Service to model service is fine. For some reason, I missed this term from DC Terms and I introduced the concept in SRIM namespace. I would agree that dct:Service is totally appropriate. I will probably update the spec to make the alignment.

To model relationships between Service and other items (Dataset, Layer, Map, etc..), we borrowed the terms used in ISO 19115 (operatesOn and servicedBy). However, we generalized the range of operatesOn to be srim:Item (which is superclass of dcat:Dataset), so we can refer to other types of assets such as Layer or Map. I strongly recommend we introduce a superclass of dcat:Dataset to accommodate this case (may be sioc:Item).

My last comment about the modeling of query parameters. Personally, I do not like the suggestion of using an external document such as OSDD, because it makes the implementation of the client more complex for the following reasons:

  1. because it requires an out of band call to get this document
  2. because it requires parsing the document in a different format than RDF.
  3. Simple case such as setting a single parameter binding such as featureId or layerId value, become cumbersome to define for a publisher and require to manage different documents.

The RDF parameter model we used in Geoplatform, maps almost one to one OSDD model (we may need to add additional properties to fill the gap such as pattern). The benefits of this approach are:

  1. we remain in-band and clients have access to all information to perform the request to the service
  2. the model is semantic (this agnostic to implementation) and the information about accessURL (including url template), parameters, the type of service (dct:type) and standard (dct:conformsTo) should be sufficient for a client to know how to build the appropriate query to the service.
  3. it is more or less aligned with the ISO 19119 standard. In ISO 19119, services are abstracted as a set of Operations with Parameters and DCP (Distributed Computing Platform) verbs. This abstraction should be sufficient to model any types of service implementation.

While we should try to accommodate the different type of service implementation (REST, SOAP, RPC), we should consider that RESTful APIs constitutes the majority of service APIs out there, so the spec should make it trivial and easy to access these services.

@andrea-perego
Copy link
Contributor

@dr-shorthair asked:

I think we need to take into account backward compatibility

Do you mean

  1. backward compatibility with DCAT 1.0 vocabulary or
  2. backward compatibility with some DCAT installations?

My original comment was indeed related to DCAT 1.0, as it was about the idea of defining a subclass of dcat:Distribution - sorry for not having made this clear.

As I said elsewhere, I'm not a priori against the idea of defining a specific class for services, as first-class citizens (i.e., not as distributions). Of course, here there may be backward compatibility issues with Geo/DCAT-AP. But I think your proposal, @dr-shorthair , does address them.

I have nonetheless some comments on the proposal at Cataloguing data services. I'll post them to #172 .

@andrea-perego
Copy link
Contributor

andrea-perego commented Mar 26, 2018

Thanks for your comments, @fellahst . Please see my replies inline.

[...] I think both approaches are very similar and probably hint that both approaches are on the right track (passing the Test of Independent Invention). [...]

I was thinking the same 😄

In UC20, you model catalog service as dcat:Catalog. This seems very weird to me. In my mind, a catalog is a collection of items, not a service. The DCAT specification defines dcat:Catalog as "a curated collection of metadata about datasets". A Catalog service provides an API that manages one or more catalogs, provides search, harvesting, indexing functions for example. Convoluting both notions (Catalog and CatalogService) will cause problems in modeling relationships between CatalogService and Catalogs.
An OGC Catalog Service (CSW) should be modeled as a dct:Service with dct:type Catalog and dct:conformsTo OGCCatalogSpec.

I agree that dcat:Catalog, in DCAT 1.0, makes no reference to the notion of service - and actually I made a comment along these lines in #172 (comment) , where I mention use cases I'm aware of where it is simply used as a way of grouping datasets based on some criteria (e.g., the datasets produced by a project/activity).

However, I'm also aware of use cases (leaving aside the GeoDCAT-AP approach) where dcat:Catalog is used to denote catalogue services (I used this term in its general sense: an endpoint from which you can get a set of metadata records).

Said that, using dcat:Catalog to cover both cases may be confusing, as you note. An option could be to (strong) type a catalogue service with both dcat:Catalog and dctype:Service:

a:CatalogService a dcat:Catalog , dctype:Service .

This could also be a possible solution to the issue I raised in #172 (comment) about making dcat:Catalog a subclass of dctype:Service.

To model relationships between Service and other items (Dataset, Layer, Map, etc..), we borrowed the terms used in ISO 19115 (operatesOn and servicedBy). However, we generalized the range of operatesOn to be srim:Item (which is superclass of dcat:Dataset), so we can refer to other types of assets such as Layer or Map. I strongly recommend we introduce a superclass of dcat:Dataset to accommodate this case (may be sioc:Item).

I think a class more general than dcat:Dataset would indeed be useful, also for catalogues. This links to the issue about having metadata in a catalogue about resources which are not datasets - see also #116 (comment).

On a different note, I would be interested to know if you considered defining a "layer" as a distribution of a dataset, and, in such a case, why you decided otherwise. Please note that I don't have any specific position on this issue - I'm just curious.

My last comment about the modeling of query parameters. Personally, I do not like the suggestion of using an external document such as OSDD, because it makes the implementation of the client more complex for the following reasons:

[snip]

I see your point, but an RDF-based approach may have other issues.

In Geo/DCAT-AP, the assumption was that the catalogue platform used might not necessarily be natively based on RDF. So, the rationale behind using a fit-for-purpose language (as OpenSearch) to specify the query parameters was based on the idea that such "language" would be more easily interpreted by software agents, compared to an ad hoc RDF representation.

Said that, we might have overestimated this problem, and therefore we could re-visit our resolution. I wonder whether you could share some details on how you've implemented your approach.

BTW, another reason for looking into OpenSearch was related to the work at OGC to extend it to support requirements of OGC services - see http://www.opengeospatial.org/standards/opensearchgeo and http://www.opengeospatial.org/standards/opensearch-eo

While we should try to accommodate the different type of service implementation (REST, SOAP, RPC), we should consider that RESTful APIs constitutes the majority of service APIs out there, so the spec should make it trivial and easy to access these services.

This could be indeed a possible way forward. My concern is whether defining how this should be done in DCAT could be out of scope of DXWG. If we opt for an RDF-based solution, this seems like something to be addressed by a specific vocabulary.

@dr-shorthair
Copy link
Contributor

@fellahst wrote

For some reason, I missed this term from DC Terms

dctype:Service is from the Dublin Core types vocabulary, which is supplemental to the (properties only) DC Terms namespace .

@jpullmann
Copy link
Author

Comment on behalf of Øystein Åsnes on "DCAT Distribution to describe web services [ID6]"

  • relates to ID18
  • important, because of increasing proliferation of APIs
  • so far ony hints on interpretation of dcat:accessURL via dcat:distribution dct:description
  • request for a common core vocabulary for describing (classifying) webservices (not only for geospacial domain)
  • link to machine readable service decriptions (e.g. WSDL), don't overload dct:conformsTo
  • make obvious and immediately discoverable (no need of prior interaction/URL reslution):
    • kind of distribution (file, webservice, feed, landingpage)
    • web service protocol (SOAP, REST)
    • Quality of Service parameters ("under development", "deprecated")
    • login/user registration keys etc. (see ID17)

@dr-shorthair
Copy link
Contributor

Also see example from @andrea-perego #166 (comment)

@dr-shorthair
Copy link
Contributor

Proposed RDF implementation in https://github.com/w3c/dxwg/blob/dcat-service-simon/dcat/rdf/dcat-service.ttl

@dr-shorthair
Copy link
Contributor

Resolved with #241

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants