Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change domain or create superclass of dcat:Distribution #317

Closed
rob-metalinkage opened this issue Aug 22, 2018 · 62 comments
Closed

Change domain or create superclass of dcat:Distribution #317

rob-metalinkage opened this issue Aug 22, 2018 · 62 comments
Assignees
Labels
change-proposal dcat:Dataset dcat:Distribution dcat:Resource dcat due for closing Issue that is going to be closed if there are no objection within 6 days requires discussion Issue to be discussed in a telecon (group or plenary)
Milestone

Comments

@rob-metalinkage
Copy link
Contributor

dcat:distribution currently has domain dcat:Dataset

dcat:Resource needs the equivalent property, so either create a a superclass of Distribution and a corresponding superproperty of dcat:distribution or relax the domain of dcat:distribution to be dcat:Resource

(i favour the latter as ist not obvious whats different about the superclass...)

see #110

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Aug 22, 2018

I assume this is to assist in managing representations of standards/profiles?

Dataset --> Distribution is a well-known terminology in the data space, but elsewhere Resource --> Representation is more typical. So I'd favour the superclass/super-property route. How about

dcat:hasRepresentation a owl:ObjectProperty ;
    rdfs:domain dcat:Resource ; 
    rdfs:range dcat:Representation .

dcat:distribution rdfs:subPropertyOf dcat:hasRepresentation .

dcat:Representation a owl:Class . 

dcat:Distribution rdfs:subClassOf dcat:Representation .

@jakubklimek
Copy link
Contributor

dcat:Resource needs the equivalent property

@rob-metalinkage Why is this? dcat:Resource is either dcat:Dataset, then it has dcat:distributions, or dcat:DataService, then it does not need a dcat:distribution. I always pictured dcat:Resource as an abstract class.

Are there any examples where you actually have an instance of dcat:Resource which is neither a dataset nor a data service and it needs to have a representation? And does this instance belong to DCAT catalogs?

@dr-shorthair
Copy link
Contributor

I understand that the concern is to be able to add Profiles (and other technical standards) to a Catalog. Is a Profile a subclass of Dataset, or is it another kind of Resource?

@rob-metalinkage
Copy link
Contributor Author

rob-metalinkage commented Aug 24, 2018

I dont think a profile is either a Dataset or a Service - so its a direct subclass of Resource, hence the need to have the equivalent of a Distribution, and the suggestion to model this consistently with the dataset/Distribution relationship. I think Services and service descriptions will need to be treated the same too - e.g. a OAS API document for a service API.

@rob-metalinkage
Copy link
Contributor Author

@jakubklimek : In the Use Cases we have documented exactly this practice, where DCAT-AP profiles have been catalogues as pseudo-Datasets.

see #238

@kcoyle
Copy link
Contributor

kcoyle commented Aug 24, 2018

As per #238 - It's been on the plenary agenda but we never discussed it. Should I move it up so that we cover it at an upcoming meeting? It seems to be addressing some things that are needed so we should see if the WG will accept it.

@makxdekkers
Copy link
Contributor

@rob-metalinkage: I agree that a profile is different from a dcat:DataService, but not so sure that profile cannot be seen as a dct:Dataset. In what way do you see it being different from "A collection of data, published or curated by a single agent, and available for access or download in one or more formats"?

@kcoyle
Copy link
Contributor

kcoyle commented Aug 24, 2018

@makxdekkers I think it depends on how "data" has been defined. If it is defined so broadly that any file of ones and zeroes is data, then it includes an e-copy of War and Peace. I don't think that's the object. But it doesn't look like "data" is itself defined, which may be the problem here. I wouldn't consider a profile that is a PDF a dataset. I'm not clear in my own mind if I would consider a profile expressed in SHACL a dataset. Yet I can see serving SKOS vocabularies as datasets.

Summary: I think this needs to be solved by defining the term "data", of which a set is called a "dataset"

@makxdekkers
Copy link
Contributor

makxdekkers commented Aug 24, 2018

@kcoyle If I remember correctly, the Goverment Linked Data working group that developed the 2014 version of DCAT spent hours and hours trying to define the boundaries of 'data' and 'dataset' and ended up with the current definition. Everytime someone suggested a boundary, someone else brought up an example of something clearly outside of the proposed boundary that everybody agreed could be seen as a dataset. Even in your examples, we could talk endlessly about why you think a concept scheme with a SKOS/RDF expression is a dataset and a profile with a SHACL/RDF expression and a printable PDF expression isn't. It's very personal. The GLD group decided that the discussion would never lead to consensus, and settled on the view that DCAT just provided a model and a set of properties that anyone could use to describe anything that they considered to be 'curated data'. I would suggest we do not reopen that discussion and try to define 'dataset' beyond the current definition. I honestly think we're not going to achieve a 'better' definition.
And, yes, it means that someone can describe the book War and Peace as a Dataset, even if other people may think that it doesn't make sense.

@agreiner
Copy link
Contributor

+1 to Makx. In any case, I think it can be useful to distinguish between a dataset and a profile. I can also think of other things that can be resources distributed with a dataset but are not generally considered datasets, such as images provided as visualizations of the data, or code lists, or written protocols. Whether or not you see any of those things as datasets, it can be useful for someone else to distinguish them.

@makxdekkers
Copy link
Contributor

I think some of this was already discussed in #64, which mentions a wide range of potential dataset types. Profile could well be a genre of dataset.

@kcoyle
Copy link
Contributor

kcoyle commented Aug 24, 2018

@makxdekkers That's fine with me if we accept the broader definition, which implies, although it does not state, that anything can be a dataset. In that case, a profile could be a dataset, but I'm not sure that treating it as a dataset is useful in our context. I think the criteria for deciding is "what functionality do we want around profiles?" not "what is the formalism that describes them?" Everything is an rdf:Resource but we usually go on to define more specific classes for our specific uses.

@dr-shorthair
Copy link
Contributor

@kcoyle a PDF or a SHACL document are both concrete representations of things, therefore they cannot be Datasets - which are conceptual things. A dataset might be represented by PDF or SHACL Distributions.

This is the core of the issue that @rob-metalinkage raised - by implication at least. i.e. that there is a separation between Profile and Profile-Description (or more generally between Technical Specification and Specification Document) which is parallel to the one that we already have between Dataset and Distribution.

@rob-metalinkage
Copy link
Contributor Author

If we really cared (and i'm not convinced we should) - the we would need to define both data and set - and to my mind what distinguished a dataset from the more general Resource, is that its a set of things - and we can make statements about both the types of things in the set and the membership of the set. I dont see documents fitting that very well, as there is not much useful to say other than 1s and 0s are ordered members of the document.

Because we would want to qualify the relationship between concrete representations and the abstract thing (i.e. a SHACL document expresses constraints against an RDF vocabulary, vs a document containing guidance) - modelling Profiles is similar to modelling Datasets and their possible distributions, or Services.

We do not need to axiomatise disjointness between subclasses of Resource, but separate models do seem to be useful, according to decisions taken already in DCAT group, so this issue is just a natural consequence of that.

@jakubklimek
Copy link
Contributor

jakubklimek commented Aug 27, 2018

@rob-metalinkage What is the actual motivation behind the original issue? If it is to accommodate for profiles, I would just create a dcat:Profile (or prof:Profile) as a subclass of dcat:Resource, create the dcat:distribution equivalent there and leave the dcat:Resource still as an "abstract" class.

Specifying rdfs:domain of dcat:distribution as dcat:Resource seems odd to me as It would mean that suddenly dcat:DataServices have dcat:Distributions, which were coupled quite tightly with dcat:Datasets.

If we are looking for modeling something like frbr:Work, frbr:Manifestation and frbr:Expression, then I think we would have to have a new superproperty with domain dcat:Resource and its subproperties, where one of those would be dcat:distribution for dcat:Datasets, and other subproperties for dcat:DataServices and prof:Profiles.

@agbeltran agbeltran added the dcat label Aug 28, 2018
@dr-shorthair
Copy link
Contributor

dr-shorthair commented Aug 30, 2018

If we need to allow for representations of things other than Datasets then I think we need a super-class of dcat:Distribution and a corresponding super-property associated with dcat:Resource - perhaps like this:

Resource + Representation

We already have

dcat:Dataset rdfs:subClassOf dcat:Resource .

This would just add the complementary

dcat:Distribution rdfs:subClassOf dcat:Representation .
dcat:distribution rdfs:subPropertyOf dcat:hasRepresentation .

Then if any new type of thing is also catalogued as an individual of a subclass of dcat:Resource (such as a profile) then a representation of this can be associated without forcing everything into the Dataset and Distribution boxes.

@jakubklimek
Copy link
Contributor

@dr-shorthair This seems clean and reasonable.

  1. What do you think of designating the dcat:Resource and dcat:Representation abstract, so that if anything else pops out (like a Profile), a subclass of Resource has to be created for that, so that it is clear what it is - i.e. discourage creating instances of just dcat:Resource and dcat:Representation?
  2. Doesn't this align with the FRBR approach? Shouldn't it be reused in a more explicit way?

@dr-shorthair
Copy link
Contributor

  1. Yes. The class names are in italics in the diagram which is the UML convention for 'Abstract'

And there is a usage note on dcat:Resource which says

It is strongly recommended to use a more specific sub-class when available.

A similar recommendation would be made on dcat:Representation.

  1. Absolutely. In a different thread somewhere I already made this point too ... though FRBR has more layers (though arguably by sticking to just four it might not be expressive enough for some cases). However, I don't think FRBR is that widely known outside of library circles, and even there is somewhat controversial (@kcoyle wrote a book on how is has hindered more than helped!)

@kcoyle
Copy link
Contributor

kcoyle commented Aug 31, 2018

I do get a FRBR-ish vibe from this, something like Resource being FRBR:Expression, but that would require the subC's of Resource to have a link to Representation (hasRepresentation) rather than a direct dcat:Distribution. That's one of the down sides of FRBR as a data structure, its linearity.

Will there be any properties associated with dcat:Representation? It does seem that many of the properties of dcat:Resource would be appropriate. That could imply that there is a need for a class of "everything" with properties appropriate to both. But that would disrupt the current dcat:Resource.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Sep 2, 2018

  1. @kcoyle note the proposed subclass and subproperty relations above. The term 'Representation' is taken from Fielding, where it is used for the concrete realization of all resource types. I understood that 'Distribution' is essentially just a special word for a 'Dataset-representation'. The proposed model above says: if the resource is a Dataset, then its representation is a Distribution and the link between them is dcat:distribution. If the resource is an individual from some other sub-class of Resource then its representation is Representation or a sub-class, and the link between them is dcat:hasRepresentation or a sub-property. I tried to make the terminology consistent with both Fielding and the DCAT-2014 legacy.

  2. Concerning potential properties of dcat:Representation: first we should consider if any of the properties of dcat:Distribution should be promoted to a superclass. We might also look for overlaps with the properties of dcat:Resource, but I'm disinclined to start looking for a higher superclass, unless you can think of a useful scenario where this is needed to support some reasoning (i.e. what's the use case?). owl:Thing will do for me (which is entailed by these all being instances of owl:Class anyway).

@dr-shorthair
Copy link
Contributor

Oh, and regarding the 'FRBR-ish vibe' - to me that seemed intrinsic to the original DCAT backbone model, with Dataset and associated Distributions, also consistent with Fielding's 2-layer REST model.

The proposed model above merely responds to the decision to add the generalization Resource by adding a parallel generalization of Distribution, here called Representation.

@dr-shorthair
Copy link
Contributor

... and which would allow us to provide a description of the related artefact, without implying that it is a distribution of a dataset. i.e. promote most of the properties of dcat:Description up to a more general dcat:Representation.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 4, 2019

Here's an example to illustrate the issue. This is one of my own datasets. This DCAT description is adapted from an earlier example that I used for the bag-of-files use-case.

The first four blank nodes are rdf:type dcat:Distribution because they are informationally-equivalent representations of the actual dataset.

The next three blank nodes are shown un-typed. They are supporting resources, not representations of the dataset. However, it is useful to provide descriptive properties taken from the set that is associated with dcat:Distribution since these are files or representations, but not of the actual dataset.

I guess it is fine from an RDF point of view to leave these nodes un-typed, but maybe they deserve a type? The 'model' is the same as dcat:Distribution, but they are not distributions-of-the-dataset.

dap:d33937
  rdf:type dcat:Dataset ;
  dct:description "A set of RDF graphs representing the International [Chrono]stratigraphic Chart, comprising Turtle serializations of data from the 2017-02 version, ..." ;
  dct:identifier "https://doi.org/10.25919/5b4d2b83cbf2d" ;
  dct:issued "2018-07-07"^^xsd:date ;
  dct:license <https://creativecommons.org/licenses/by/4.0/> ;
  dct:publisher <http://www.csiro.au> ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.jsonld" ;
      dcat:mediaType "application/ld+json" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.nt" ;
      dcat:mediaType "application/n-triples" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.rdf" ;
      dcat:mediaType "application/rdf+xml" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.ttl" ;
      dcat:mediaType "text/turtle" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.jpg> ;
      dcat:mediaType  "img/jpeg" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.pdf> ;
      dcat:mediaType "application/pdf" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      dcat:downloadURL <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:mediaType "text/turtle" ;
      dct:conformsTo <https://www.w3.org/TR/owl2-overview/> ;
      dct:description "This is an RDF/OWL representation of the GeoSciML Geologic Timescale model, which has been adapted from the model described in Cox, S.J.D, & Richard, S.M. (2005) A formal model for the geologic timescale and GSSP, compatible with geospatial information transfer standards, Geosphere, Geological Society of America 1/3, 119–137." ;
      dct:issued "2011-01-01"^^xsd:date ;
      dct:issued "2017-04-28"^^xsd:date ;
      dct:title "Geologic Timescale model" ;
    ] ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:33937> ;
.

@makxdekkers
Copy link
Contributor

@dr-shorthair I think you are complicating the issue more than necessary. Your explanation that things that are not representations of the dataset are representations of something else got my head spinning. You are using the notion of 'representation' which is already used in the revision (both the current version and 2PWD) and suggested as a synonym of distribution - but I think that term is confusing. For example the definition of dcat:Distribution now states: "represents an accessible form or representation of a dataset ..." so it "represents ... a representation". Not helpful. I think a dcat:Distribution is just "represents an accessible form of a dataset ...".
Is there really a need to make it more complicated by creating a new class dcat:Representation?
In your example, the two images can be modelled as foaf:Document with dct:type Image, and the third one could be modelled as an adms:Asset with dct:type http://purl.org/adms/assettype/DomainModel and one or more adms:AssetDistributions.

@smrgeoinfo
Copy link
Contributor

I agree that the distribution should represent an accessible form of a dataset; as such shouldn't the distributions have either accessURL or downloadURL
As far as the related resources (the dct:relation items) I think @makxdekkers typing suggestion is a good solution-- the relation is to a resource that has a type, and that resource has its own distributions (which I guess can be represented by adms:AssetDistribution). The only thing I think would be useful to add is a way to specify what the relation is about (documentation, guidance, examples, critique, annotation...).

@dr-shorthair
Copy link
Contributor

Thanks @makxdekkers - I think you are suggesting that there are more than enough classes available in existing vocabularies to enable suitable typing of the target of dct:relation properties. That is true, though it does require users to be familiar with the options. I guess I was trying to gather them together in the scope of DCAT but I'll stop now.

Regarding this phrasing

the definition of dcat:Distribution now states: "represents an accessible form or representation of a dataset ..." so it "represents ... a representation".

Yes, that is unfortunate. The noun 'representation' in this sentence is a nod to Fielding's 'REST' principle, where it contributes the 'R'. You will also recall that an earlier iteration of the definition avoided the repetition as it read "describes an accessible form or representation of a dataset ...", but then either you or @kcoyle requested the change! But probably not necessary to introduce this in the overview - it is clarified in the normative clause anyway.

@dr-shorthair
Copy link
Contributor

So this more complete example could be added to Appendix D?

dap:d33937
  rdf:type dcat:Dataset ;
  dct:description "A set of RDF graphs representing the International [Chrono]stratigraphic Chart, ..." ;
  dct:conformsTo <http://resource.geosciml.org/ontology/timescale/gts> ;
  dct:identifier "https://doi.org/10.25919/5b4d2b83cbf2d" ;
  dct:issued "2018-07-07"^^xsd:date ;
  dct:license <https://creativecommons.org/licenses/by/4.0/> ;
  dct:publisher <http://www.csiro.au> ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.jsonld" ;
      dcat:mediaType "application/ld+json" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.nt" ;
      dcat:mediaType "application/n-triples" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.rdf" ;
      dcat:mediaType "application/rdf+xml" ;
    ] ;
  dcat:distribution [
      rdf:type dcat:Distribution ;
      dct:identifier "isc2017.ttl" ;
      dcat:mediaType "text/turtle" ;
    ] ;
  dct:relation [
      rdf:type foaf:Document ;
      dct:type <http://purl.org/dc/dcmitype/Image> ;
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.jpg> ;
      dcat:mediaType  "img/jpeg" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      rdf:type foaf:Document ;
      dct:type <http://purl.org/dc/dcmitype/Image> ;
      dcat:downloadURL <http://stratigraphy.org/ICSchart/ChronostratChart2017-02.pdf> ;
      dcat:mediaType "application/pdf" ;
      dct:description "Coloured image representation of the International Chronostratigraphic Chart" ;
      dct:issued "2017-02-01"^^xsd:date ;
      dct:title "International Chronostratigraphic Chart" ;
    ] ;
  dct:relation [
      rdf:type adms:Asset ;
      dct:type <http://purl.org/adms/assettype/DomainModel> ;
      dcat:downloadURL <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:landingPage <http://resource.geosciml.org/ontology/timescale/gts> ;
      dcat:mediaType "text/turtle" ;
      dct:conformsTo <https://www.w3.org/TR/owl2-overview/> ;
      dct:description "This is an RDF/OWL representation of the GeoSciML Geologic Timescale model ..." ;
      dct:issued "2011-01-01"^^xsd:date ;
      dct:issued "2017-04-28"^^xsd:date ;
      dct:title "Geologic Timescale model" ;
    ] ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:33937> ;
.

@dr-shorthair
Copy link
Contributor

@rob-metalinkage I think we moved away from the concern that originally motivated you to create this issue. But I think the sense of the DCAT team is that your suggestions would be best dealt with outside the core DCAT vocabulary, e.g. in a profile of DCAT for profiles, if you like.

@agreiner
Copy link
Contributor

agreiner commented Feb 4, 2019

One solution to the concern about saying that a distro "represents ... a representation" would be to remove the first use of "representation" in the definition. The word "represents" at the beginning of nearly all the definitions has bothered me, too. A dcat:Distribution doesn't represent a distribution, it is a distribution.
A dcat:Catalog is a dataset in which each individual item...
A dcat:Resource is an individual item in a catalog...
A dcat:Dataset is a dataset in a catalog...
A dcat:Distribution is an accessible form or representation of a dataset...

@andrea-perego
Copy link
Contributor

I don't want to unduly broaden this issue, but the discussion on dcat:distribution / dct:relation brings to my mind a couple of considerations:

  1. Do we have already UCs and/or implementation evidence on the use of specific subproperties of dct:relation which can be used to model some specific resource types? In such a case, should be provide guidance on when to use them? E.g., in the requirements for data to be documented in the JRC Data Catalogue we identified 3 resource types that should be supported: distributions (dcat:distribution), publications about a dataset (dct:isReferencedBy), and those not fitting in either of the previous categories (dct:relation).

  2. @dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

I'll create separate issues for these 2 points.

@agbeltran
Copy link
Member

@andrea-perego about this:

1. Do we have already UCs and/or implementation evidence on the use of specific subproperties of `dct:relation` which can be used to model some specific resource types? In such a case, should be provide guidance on when to use them? E.g., in the requirements for data to be documented in the JRC Data Catalogue we identified 3 resource types that should be supported: distributions (`dcat:distribution`), publications about a dataset (`dct:isReferencedBy`), and those not fitting in either of the previous categories (`dct:relation`).

We have an issue about linking datasets and publications #63, and we are using dct:relation for things that don't fit distributions or publications (as per https://w3c.github.io/dxwg/dcat/#bag-of-files). So I don't think we need a new issue for this?

@agbeltran
Copy link
Member

2. @dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

I do agree that this decision should be left to the data provider, but I believe @dr-shorthair was pointing out a way to show what are distributions (as informationally equivalent representations of a dataset) vs other files that are about the dataset but are not a dataset distribution. IMO, it would be to the data provider to decide if a 'visualisation' of the dataset is an informationally equivalent representation of the dataset with respect to the other available representations.

So, I think this issue should be analysed in conjunction with #411 and possibly also #433 and #531. I created a project to see if it help us addressing all these issues simultaneously: https://github.com/w3c/dxwg/projects/8

@dr-shorthair
Copy link
Contributor

@andrea-perego wrote:

2. @dr-shorthair 's example includes documents that could be seen as dataset "visualisations". Should they be considered or not as distributions? Or, at least, should this be something to be left to the decision of the data provider?

They might be visualizations of the underlying data, but they are definitely not visualizations of "A set of RDF graphs representing ..."

@rob-metalinkage
Copy link
Contributor Author

ironically, the Use Case that motivated this was the one @makxdekkers referred to where Profiles where cataloged, and resources describing different aspects them were modelled as Distributions. So OK to say that example is "wrong" (i.e. inconsistent with DCAT semantics)

  • the Profiles Ontology is not dependent on DCAT so its not affected. DCAT simply has no reusable way of attaching resources to dcat:Resources in general - only a specific sense is supported for dataset distributions and people will have to "roll their own" using other vocabularies or just keep using DCAT in catalogues in looser interpretations than the DCAT spec.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 6, 2019

  • DCAT simply has no reusable way of attaching resources to dcat:Resources in general

That's not correct. Use dct:relation - see https://w3c.github.io/dxwg/dcat/#Property:resource_relation

@rob-metalinkage
Copy link
Contributor Author

"dct:relation SHOULD be used where the nature of the relationship between a catalogued item and related resources is not known. A more specific sub-property SHOULD be used if the nature of the relationship of the link is known. The property dcat:distribution SHOULD be used to link from a dcat:Dataset to a representation of the dataset, described as a dcat:Distribution"

"in general" includes the important case when the relation type is known, so "a more specific sub-property" is called for, but not defined by DCAT. I guess this is just a qualified relationship pattern.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 6, 2019

All the immediate sub-properties in the dcat: and dct: namespaces are listed right there. And immediately following is the description of the qualified-relation pattern. There might be a tweak needed, but otherwise I suggest that all the ingredients that could reasonably be provided by DCAT are already there. Maybe modify

A resource with an unspecified relationship to the catalogued item.


"A resource with a general or unspecified relationship to the catalogued item."

Maybe also list sub-properties prov:wasGeneratedBy, prov:wasDerivedFrom, prov:wasAttributedTo.

@makxdekkers
Copy link
Contributor

On 'visualisations', there is a vocabulary at the Publications Office of the EU for "Distribution type": https://publications.europa.eu/en/web/eu-vocabularies/at-concept-scheme/-/resource/authority/distribution-type/. This controlled vocabulary was created for DCAT-2014 where several type of non-file distributions were allowed. Using dct:type on dcat:Distribution one can indicate that the distribution is a visualisation of the dataset. This is recommended in the specification of StatDCAT-AP (https://joinup.ec.europa.eu/release/statdcat-ap-v100 section 7.2.3).

@davebrowning
Copy link
Contributor

The key phrase there is

one can indicate that the distribution is a visualisation of the dataset.

@dr-shorthair's reaction above shows that he doesn't see those pictures as visualisation of the dataset (though they probably are of the underlying information). Other publishers of this might make a different choice (and would probably use different language to describe the dataset as well) I think we decided that the right positioning for the DCAT vocab (as opposed to any design guidance/training) was to stay silent on that choice.

@davebrowning
Copy link
Contributor

(Noting this has been closed at least once before, though there has been quite a bit of discussion in #482 that touches on who makes the choice of what can be a distribution or just some other 'thing'. That other issue drove some significant clarification of the concepts also of where DCAT was deliberately remaining silent)

Suggest we either close this or mark it as future work. Views, particularly from @rob-metalinkage, @dr-shorthair (or indeed anyone with a strong view)

[In the absence of a countervailing view my instinct is to close it as the spec has moved on and discussion of the issue covered quite a bit of ground that has since been modified]

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Sep 23, 2019

While I stand by the model sketched above in principle, there is probably an overriding principle here: that if an abstract superclass only has one concrete sub-class, then it is probably redundant and confusing. I understand that it might be convenient for the Profiles vocabulary to be tied into a generalized Resource --> Representation pattern, that requirement is external to DCAT so I don't think we have an obligation to meet it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change-proposal dcat:Dataset dcat:Distribution dcat:Resource dcat due for closing Issue that is going to be closed if there are no objection within 6 days requires discussion Issue to be discussed in a telecon (group or plenary)
Projects
None yet
Development

No branches or pull requests