-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change domain or create superclass of dcat:Distribution #317
Comments
I assume this is to assist in managing representations of standards/profiles? Dataset --> Distribution is a well-known terminology in the data space, but elsewhere Resource --> Representation is more typical. So I'd favour the superclass/super-property route. How about
|
@rob-metalinkage Why is this? Are there any examples where you actually have an instance of |
I understand that the concern is to be able to add Profiles (and other technical standards) to a Catalog. Is a Profile a subclass of Dataset, or is it another kind of Resource? |
I dont think a profile is either a Dataset or a Service - so its a direct subclass of Resource, hence the need to have the equivalent of a Distribution, and the suggestion to model this consistently with the dataset/Distribution relationship. I think Services and service descriptions will need to be treated the same too - e.g. a OAS API document for a service API. |
@jakubklimek : In the Use Cases we have documented exactly this practice, where DCAT-AP profiles have been catalogues as pseudo-Datasets. see #238 |
As per #238 - It's been on the plenary agenda but we never discussed it. Should I move it up so that we cover it at an upcoming meeting? It seems to be addressing some things that are needed so we should see if the WG will accept it. |
@rob-metalinkage: I agree that a profile is different from a |
@makxdekkers I think it depends on how "data" has been defined. If it is defined so broadly that any file of ones and zeroes is data, then it includes an e-copy of War and Peace. I don't think that's the object. But it doesn't look like "data" is itself defined, which may be the problem here. I wouldn't consider a profile that is a PDF a dataset. I'm not clear in my own mind if I would consider a profile expressed in SHACL a dataset. Yet I can see serving SKOS vocabularies as datasets. Summary: I think this needs to be solved by defining the term "data", of which a set is called a "dataset" |
@kcoyle If I remember correctly, the Goverment Linked Data working group that developed the 2014 version of DCAT spent hours and hours trying to define the boundaries of 'data' and 'dataset' and ended up with the current definition. Everytime someone suggested a boundary, someone else brought up an example of something clearly outside of the proposed boundary that everybody agreed could be seen as a dataset. Even in your examples, we could talk endlessly about why you think a concept scheme with a SKOS/RDF expression is a dataset and a profile with a SHACL/RDF expression and a printable PDF expression isn't. It's very personal. The GLD group decided that the discussion would never lead to consensus, and settled on the view that DCAT just provided a model and a set of properties that anyone could use to describe anything that they considered to be 'curated data'. I would suggest we do not reopen that discussion and try to define 'dataset' beyond the current definition. I honestly think we're not going to achieve a 'better' definition. |
+1 to Makx. In any case, I think it can be useful to distinguish between a dataset and a profile. I can also think of other things that can be resources distributed with a dataset but are not generally considered datasets, such as images provided as visualizations of the data, or code lists, or written protocols. Whether or not you see any of those things as datasets, it can be useful for someone else to distinguish them. |
I think some of this was already discussed in #64, which mentions a wide range of potential dataset types. Profile could well be a genre of dataset. |
@makxdekkers That's fine with me if we accept the broader definition, which implies, although it does not state, that anything can be a dataset. In that case, a profile could be a dataset, but I'm not sure that treating it as a dataset is useful in our context. I think the criteria for deciding is "what functionality do we want around profiles?" not "what is the formalism that describes them?" Everything is an rdf:Resource but we usually go on to define more specific classes for our specific uses. |
@kcoyle a PDF or a SHACL document are both concrete representations of things, therefore they cannot be This is the core of the issue that @rob-metalinkage raised - by implication at least. i.e. that there is a separation between |
If we really cared (and i'm not convinced we should) - the we would need to define both data and set - and to my mind what distinguished a dataset from the more general Resource, is that its a set of things - and we can make statements about both the types of things in the set and the membership of the set. I dont see documents fitting that very well, as there is not much useful to say other than 1s and 0s are ordered members of the document. Because we would want to qualify the relationship between concrete representations and the abstract thing (i.e. a SHACL document expresses constraints against an RDF vocabulary, vs a document containing guidance) - modelling Profiles is similar to modelling Datasets and their possible distributions, or Services. We do not need to axiomatise disjointness between subclasses of Resource, but separate models do seem to be useful, according to decisions taken already in DCAT group, so this issue is just a natural consequence of that. |
@rob-metalinkage What is the actual motivation behind the original issue? If it is to accommodate for profiles, I would just create a Specifying If we are looking for modeling something like |
If we need to allow for representations of things other than We already have
This would just add the complementary
Then if any new type of thing is also catalogued as an individual of a subclass of |
@dr-shorthair This seems clean and reasonable.
|
And there is a usage note on
A similar recommendation would be made on
|
I do get a FRBR-ish vibe from this, something like Resource being FRBR:Expression, but that would require the subC's of Resource to have a link to Representation (hasRepresentation) rather than a direct dcat:Distribution. That's one of the down sides of FRBR as a data structure, its linearity. Will there be any properties associated with dcat:Representation? It does seem that many of the properties of dcat:Resource would be appropriate. That could imply that there is a need for a class of "everything" with properties appropriate to both. But that would disrupt the current dcat:Resource. |
|
Oh, and regarding the 'FRBR-ish vibe' - to me that seemed intrinsic to the original DCAT backbone model, with The proposed model above merely responds to the decision to add the generalization |
... and which would allow us to provide a description of the related artefact, without implying that it is a distribution of a dataset. i.e. promote most of the properties of dcat:Description up to a more general dcat:Representation. |
Here's an example to illustrate the issue. This is one of my own datasets. This DCAT description is adapted from an earlier example that I used for the bag-of-files use-case. The first four blank nodes are The next three blank nodes are shown un-typed. They are supporting resources, not representations of the dataset. However, it is useful to provide descriptive properties taken from the set that is associated with I guess it is fine from an RDF point of view to leave these nodes un-typed, but maybe they deserve a type? The 'model' is the same as dcat:Distribution, but they are not distributions-of-the-dataset.
|
@dr-shorthair I think you are complicating the issue more than necessary. Your explanation that things that are not representations of the dataset are representations of something else got my head spinning. You are using the notion of 'representation' which is already used in the revision (both the current version and 2PWD) and suggested as a synonym of distribution - but I think that term is confusing. For example the definition of |
I agree that the distribution should represent an accessible form of a dataset; as such shouldn't the distributions have either accessURL or downloadURL |
Thanks @makxdekkers - I think you are suggesting that there are more than enough classes available in existing vocabularies to enable suitable typing of the target of Regarding this phrasing
Yes, that is unfortunate. The noun 'representation' in this sentence is a nod to Fielding's 'REST' principle, where it contributes the 'R'. You will also recall that an earlier iteration of the definition avoided the repetition as it read "describes an accessible form or representation of a dataset ...", but then either you or @kcoyle requested the change! But probably not necessary to introduce this in the overview - it is clarified in the normative clause anyway. |
So this more complete example could be added to Appendix D?
|
@rob-metalinkage I think we moved away from the concern that originally motivated you to create this issue. But I think the sense of the DCAT team is that your suggestions would be best dealt with outside the core DCAT vocabulary, e.g. in a profile of DCAT for profiles, if you like. |
One solution to the concern about saying that a distro "represents ... a representation" would be to remove the first use of "representation" in the definition. The word "represents" at the beginning of nearly all the definitions has bothered me, too. A dcat:Distribution doesn't represent a distribution, it is a distribution. |
I don't want to unduly broaden this issue, but the discussion on
I'll create separate issues for these 2 points. |
@andrea-perego about this:
We have an issue about linking datasets and publications #63, and we are using |
I do agree that this decision should be left to the data provider, but I believe @dr-shorthair was pointing out a way to show what are distributions (as informationally equivalent representations of a dataset) vs other files that are about the dataset but are not a dataset distribution. IMO, it would be to the data provider to decide if a 'visualisation' of the dataset is an informationally equivalent representation of the dataset with respect to the other available representations. So, I think this issue should be analysed in conjunction with #411 and possibly also #433 and #531. I created a project to see if it help us addressing all these issues simultaneously: https://github.com/w3c/dxwg/projects/8 |
@andrea-perego wrote:
They might be visualizations of the underlying data, but they are definitely not visualizations of "A set of RDF graphs representing ..." |
ironically, the Use Case that motivated this was the one @makxdekkers referred to where Profiles where cataloged, and resources describing different aspects them were modelled as Distributions. So OK to say that example is "wrong" (i.e. inconsistent with DCAT semantics)
|
That's not correct. Use |
"dct:relation SHOULD be used where the nature of the relationship between a catalogued item and related resources is not known. A more specific sub-property SHOULD be used if the nature of the relationship of the link is known. The property dcat:distribution SHOULD be used to link from a dcat:Dataset to a representation of the dataset, described as a dcat:Distribution" "in general" includes the important case when the relation type is known, so "a more specific sub-property" is called for, but not defined by DCAT. I guess this is just a qualified relationship pattern. |
All the immediate sub-properties in the
→ Maybe also list sub-properties |
On 'visualisations', there is a vocabulary at the Publications Office of the EU for "Distribution type": https://publications.europa.eu/en/web/eu-vocabularies/at-concept-scheme/-/resource/authority/distribution-type/. This controlled vocabulary was created for DCAT-2014 where several type of non-file distributions were allowed. Using |
The key phrase there is
@dr-shorthair's reaction above shows that he doesn't see those pictures as visualisation of the dataset (though they probably are of the underlying information). Other publishers of this might make a different choice (and would probably use different language to describe the dataset as well) I think we decided that the right positioning for the DCAT vocab (as opposed to any design guidance/training) was to stay silent on that choice. |
(Noting this has been closed at least once before, though there has been quite a bit of discussion in #482 that touches on who makes the choice of what can be a distribution or just some other 'thing'. That other issue drove some significant clarification of the concepts also of where DCAT was deliberately remaining silent) Suggest we either close this or mark it as future work. Views, particularly from @rob-metalinkage, @dr-shorthair (or indeed anyone with a strong view) [In the absence of a countervailing view my instinct is to close it as the spec has moved on and discussion of the issue covered quite a bit of ground that has since been modified] |
While I stand by the model sketched above in principle, there is probably an overriding principle here: that if an abstract superclass only has one concrete sub-class, then it is probably redundant and confusing. I understand that it might be convenient for the Profiles vocabulary to be tied into a generalized |
dcat:distribution currently has domain dcat:Dataset
dcat:Resource needs the equivalent property, so either create a a superclass of Distribution and a corresponding superproperty of dcat:distribution or relax the domain of dcat:distribution to be dcat:Resource
(i favour the latter as ist not obvious whats different about the superclass...)
see #110
The text was updated successfully, but these errors were encountered: