Need for a common approach to modeling dataset series in DCAT-AP #155

aidig · 2020-08-07T12:40:57Z

The need for a common approach to modeling dataset series has already been identified as a significant outstanding issue in DCAT (w3c/dxwg#868), and it will hopefully be adressed in DCAT 3.

However, in the meantime, various national and domain specific profiles of DCAT-AP 2.0 already suggest to implement structures to handle dataset series despite DCAT or DCAT-AP not offering the necessary properties/class or specific guidelines for this directly in the specification documents. Several approaches seems to indicate use of dct:hasPart/dct:isPartOf although other proposals have also emerged.

It would be beneficial if this issue could be prioritised in DCAT-AP future work.

jakubklimek · 2020-08-07T12:44:01Z

I can confirm that the Czech National Open Data Catalog (https://data.gov.cz) will soon be implementing dataset series through dcterms:hasPart/dcterms:isPartOf.

aidig · 2020-08-07T12:48:43Z

Also, the article - DCAT-AP: How to model Dataset series? - has previously been published but the document is 4 years old and the status is unclear.
Link: https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series
It states "switch to the latest release" and redirects to https://joinup.ec.europa.eu/release/which-processes-and-tools-could-be-used-manage-quality-metadata/10

aidig · 2020-08-07T12:58:57Z

Related references:

DCAT-AP-NO: uses hasPart/isPartOf:
[behov] Beskrive tidsserier og samlinger av datasett dcat:Distribution - remove dct:rights? #21 [behov] Beskrive tidsserier og samlinger av datasett Informasjonsforvaltning/dcat-ap-no#21
DCAT-AP-SE: uses dcat:downloadURL - Recommendation 4: https://diggsweden.github.io/DCAT-AP-SE/docs/recommendations.html#4-vissa-datam%C3%A4ngder-delas-med-f%C3%B6rdel-upp-i-flera-filer og Förtydliga om hur man lägger upp flera filer till en distribution diggsweden/DCAT-AP-SE#17
DCAT-AP-CZ: uses hasPart/isPartOf: https://ofn.gov.cz/rozhran%C3%AD-katalog%C5%AF-otev%C5%99en%C3%BDch-dat/draft/#polo%C5%BEky-datov%C3%A1-sada-je-sou%C4%8D%C3%A1st%C3%AD
bregDCAT-AP: uses hasPart/isPartOf: https://joinup.ec.europa.eu/collection/access-base-registries/solution/abr-specification-registry-registries/release/200
geoDCAT-AP: use of dct:type http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series
See " II.3 Resource type" https://joinup.ec.europa.eu/solution/geodcat-application-profile-data-portals-europe/distribution/geodcat-ap-101-docx
DCAT-AP: How to model Dataset series? https://joinup.ec.europa.eu/release/dcat-ap-how-model-dataset-series
DCAT regarding dcat:Distribution (see note) https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution
w3c/dxwg discussion - Dataset series #868 Dataset series w3c/dxwg#868
w3c/dxwg discussion - Examples for dataset series #806 Examples for dataset series w3c/dxwg#806

bertvannuffelen · 2020-08-28T08:38:05Z

@aidig thanks for the good overview. Lets work towards a clearer proposal

aidig · 2020-09-30T09:01:23Z

There are several examples of a an approach not mentioned in the above list, namely specifying the annual 'versions' as dcat:Distributions.

For instance:
https://data.europa.eu/euodp/da/data/dataset/DAT-105-enta/dataset/transparency-register
https://www.europeandataportal.eu/data/datasets/e6d7b3ac-1ef1-476f-aed2-15645ba60248?locale=en

This approach does not seems to not take into consideration the DCATs note on the use of dcat:Distribution - that is "all distributions of one dataset should broadly contain the same data." DCAT does also state that "the distributions might have different levels of fidelity to the underlying data" and the interpretation is 'application specific', but such use seems problematic and advice and recommendations from the DCAT Application Profile is still required.
Ref: https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution

It would be great if the DCAT-AP's proposal for guidelines on this topic could address this too.

aidig · 2020-10-19T06:49:48Z

In addition, note that JoinUp uses dct:isVersionOf (in an ADMS-AP to link solutions (eg. a vocabulary modelled as a dcat:Dataset) to a release (eg. a versioned vocabulary modelled as a dcat:Dataset). A different scenario to times series, but relevant for scoping the properties needed.

See related issue: Modelling three-level structures with DCAT/ADMS #149

aidig · 2020-10-19T07:15:11Z

The DCAT Application Profile for Base Registries (bregDCAT-AP) has - as noted in the above - already made the decision to model relationships in which datasets are contained in other datasets, that is, a dataset is a subset of another using dct:hasPart/dct:isPartOf and state that similar mechanism adopted in the future should be based on these Dublin Core terms.

To ensure interoperability, please ensure close coordination and collaboration between DCAT-AP and bregDCAT-AP.

Generally, there is a need for modelling a dataset that is part of another dataset, and one can only hope that the various profiles of DCAT take the same approach in modelling this relationship.

andrea-perego · 2020-10-19T09:44:08Z

To complement your survey, @aidig , DCAT-AP_IT (the Italian profile of DCAT-AP) provides guidelines on the use of dct:hasPart , dct:isPartOf , dct:hasVersion , dct:isVersionOf :

https://docs.italia.it/italia/daf/lg-patrimonio-pubblico/it/stabile/modellometadati.html#come-gestire-le-relazioni-tra-dataset

I include below the (automatic) English translation:

How to manage relationships between datasets
The European vocabulary DCAT treats the main conceptual dataset entity as independent, seen only in relation to the catalog and its distributions. However, in practice, more complex relationships emerge between datasets, as in the case of datasets (eg, time series), versionings, portions of a larger dataset, or collections (i.e. datasets that belong to a general topic but are based on different dimensions, also based on specific use cases; an example is the case of the election results datasets). This current lack of the DCAT vocabulary also affects the European DCAT-AP profile which in any case provides recommendations for possible implementations in the presence of these complex relationships.
NOTE
In the context of these guidelines, the relevant European recommendations are adopted .
In particular, although administrations are encouraged, where possible, to limit the proliferation of datasets , in order to model their inter-relationships, some representation methods are listed below:
in the case of versioning , the current Italian profile DCAT-AP_IT already provides for the use of the Dublin Core * dct: isVersionOf * property ; administrations can also use the reverse property dct: hasVersion in addition to create a relationship between two different versions of the data. However, it is not recommended to create new datasets for small data changes. Instead, it is recommended to define new datasets only in the presence of significant changes compared to previous versions (eg, new elements included, significant adaptations of some elements, etc.);
in the case of data series, views on datasets and collections it is recommended to adopt the following solution:
Emphasize the series, view or collection itself, creating a single dataset for it whose members are different distributions of the created dataset.
However, where such a solution is difficult to apply, it is also possible to emphasize the individual elements of the series, views or collections. In this case, however, it is advisable to proceed as follows:
create a series-type dataset, using the Dublin Core dct: type property which it takes as a value; < http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series >;
create for this dataset many members, in turn datasets, specified by the Dublin Core property dct: hasPart ;
individual datasets that are members of the series will have a Dublin Core dct: isPartOf property that binds them to the initial series dataset.

aidig · 2020-10-19T10:20:18Z

Many thanks for the info @andrea-perego! Much appreciated :-)

How does the below solution align with the semantics of dcat:Distribution and the corresponding W3C note: "all distributions of one dataset should broadly contain the same data" . DCAT also states though that "the question of whether different representations can be understood to be distributions of the same dataset, or distributions of different datasets, is application specific."

in the case of data series, views on datasets and collections it is recommended to adopt the following solution:

Emphasize the series, view or collection itself, creating a single dataset for it whose members are different distributions of the created dataset.

andrea-perego · 2020-10-19T11:01:49Z

I think it complies with it. The dcat:Distribution NOTE in DCAT2 was included following requests for guidance - and it is not prescriptive. It gives an indication about the default approach to be used, but it recognises (as stated in the sentence you cite) that alternative solutions are applicable as well, depending on the requirements of the application scenario.

pebran · 2020-10-21T09:53:29Z

I can confirm that the Czech National Open Data Catalog (https://data.gov.cz) will soon be implementing dataset series through dcterms:hasPart/dcterms:isPartOf.

@jakubklimek will that be with direct use of the DC properties or by creation of more specifik subproperties?

jakubklimek · 2020-10-21T11:15:43Z

@pebran It will be the direct use of dcterms:isPartOf.

init-dcat-ap-de · 2023-07-21T14:14:05Z

So we now have two types of Datasets, those inSeries and normal Datasets.

The Dataset member of a Dataset Series has only 6 properties. title, description and frequency are als part of the "normal" Dataset. One could think that these are the only 6 properties we want to see in the a Dataset member of a Dataset Series. But I think that's wrong. As far as I can see it, these three properties have different usage texts than their "normal Dataset" counterparts.

I think this should be better explained.

bertvannuffelen · 2023-08-23T09:32:02Z

Indeed, there are 2 types.

Observe that the type InSeries Dataset is a subclass of a normal Dataset.
That means that all properties of a normal Dataset apply to those of an InSeries Dataset.
That is the nature of a subclass.

Only the properties that require special attention for an InSeries Dataset are included for that class.
These are the mandatory normal Dataset ones, those with their updated usage guidelines and constraints, and those that are unique for this scope. That allows readers to have a focused view.

So we rely on that users understand the notion of a subclass as: "all rules and constraints of the superclass apply to me".

We could add in the class usage note an additional sentence such as "This class is a subclass of Dataset and therefore all properties with with their constraints apply to this. For readability purposes these are not copied to this class."

Note that a similar general statement w.r.t. DCAT is mentioned in the last paragraph of https://semiceu.github.io/DCAT-AP/releases/3.0.0/#specoverview.

init-dcat-ap-de · 2023-08-24T14:25:47Z

I think the subclass relationship is difficult because it's technically not a subclass. It uses the same URI as the "normal" Dataset.

My suggestion would be to remove the subclass relationship and adjust the usage note to something like this:

If a Dataset is used as part of a DatasetSeries, the properties listed here can be used additionally, or slightly differently to those listed for the Dataset outside of a DatasetSeries.

aidig mentioned this issue Aug 7, 2020

Vejledning til modellering af tidsserier digst/DCAT-AP-DK#24

Open

andrea-perego mentioned this issue Oct 19, 2020

Dataset series w3c/dxwg#868

Closed

bertvannuffelen added the future-work this topic will be dealt in the future label Jul 8, 2021

bertvannuffelen added the status:waitingForDecisionW3C The issue is handled at W3C label Sep 13, 2021

aidig mentioned this issue Oct 26, 2021

usage guide on dataset - distribution - data service #204

Closed

Juan-Juan-1 mentioned this issue Nov 4, 2021

dcat:Distributions Usage note should not encourage the usage of Distributions as Dataseries opendata-swiss/dcat_ap_ch#112

Open

bertvannuffelen added the next-webinar label Nov 8, 2022

init-dcat-ap-de mentioned this issue Sep 6, 2023

Clarify chapter on dataset members in a dataset series #278

Closed

bertvannuffelen added the status:fixed This issue has been fixed in a draft. label Jan 31, 2024

bertvannuffelen added the release:3.0.0-jun2024 label Jun 17, 2024

bertvannuffelen closed this as completed Jun 17, 2024

bertvannuffelen removed status:fixed This issue has been fixed in a draft. release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 labels Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for a common approach to modeling dataset series in DCAT-AP #155

Need for a common approach to modeling dataset series in DCAT-AP #155

aidig commented Aug 7, 2020 •

edited

Loading

jakubklimek commented Aug 7, 2020

aidig commented Aug 7, 2020 •

edited

Loading

aidig commented Aug 7, 2020 •

edited

Loading

bertvannuffelen commented Aug 28, 2020

aidig commented Sep 30, 2020 •

edited

Loading

aidig commented Oct 19, 2020 •

edited

Loading

aidig commented Oct 19, 2020

andrea-perego commented Oct 19, 2020

How to manage relationships between datasets

aidig commented Oct 19, 2020

andrea-perego commented Oct 19, 2020

pebran commented Oct 21, 2020

jakubklimek commented Oct 21, 2020

init-dcat-ap-de commented Jul 21, 2023

bertvannuffelen commented Aug 23, 2023

init-dcat-ap-de commented Aug 24, 2023

Need for a common approach to modeling dataset series in DCAT-AP #155

Need for a common approach to modeling dataset series in DCAT-AP #155

Comments

aidig commented Aug 7, 2020 • edited Loading

jakubklimek commented Aug 7, 2020

aidig commented Aug 7, 2020 • edited Loading

aidig commented Aug 7, 2020 • edited Loading

bertvannuffelen commented Aug 28, 2020

aidig commented Sep 30, 2020 • edited Loading

aidig commented Oct 19, 2020 • edited Loading

aidig commented Oct 19, 2020

andrea-perego commented Oct 19, 2020

How to manage relationships between datasets

aidig commented Oct 19, 2020

andrea-perego commented Oct 19, 2020

pebran commented Oct 21, 2020

jakubklimek commented Oct 21, 2020

init-dcat-ap-de commented Jul 21, 2023

bertvannuffelen commented Aug 23, 2023

init-dcat-ap-de commented Aug 24, 2023

aidig commented Aug 7, 2020 •

edited

Loading

aidig commented Aug 7, 2020 •

edited

Loading

aidig commented Aug 7, 2020 •

edited

Loading

aidig commented Sep 30, 2020 •

edited

Loading

aidig commented Oct 19, 2020 •

edited

Loading