-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset series #868
Comments
Or the spatial coverage changes, as is the case with maps in a series |
Fully agree that there would be great value in some story, though I'm a little unsure that there is a single pattern that can be recommended. Worth trying, anyway, now we seem to have a stable definition of the qualities of The spatial series seems a good entry point here: in that case would each map be a For the temporal case, we might be able to do the same kind of pattern, but I suspect that there are naturally more things at play there, or possibly other use cases. Versioning - or more specifically change through the progress of time - seems to me to become part of the picture very quickly, whereas for spatial and other kinds of series it would be a bit more orthogonal. Perhaps the temporal kind of series is better looked at via some kind of service paradigm. That might all just be my narrow/limited world view, though. 😄 |
As far as I am concerned, I don't think we're going to resolve this issue before finalising the CR. There are several dimensions to this, and I've been involved in long discussions that did not lead to a resolution after many months. Maybe we should move it to the list for version 3, and set up a sprint for it after publication of the CR? |
+1 to Makx's timing suggestion - I was assuming so, actually. I think there is some value in having such things in the Future Priority backlog - it may solicit further input or use cases. |
Yes, sorry, I did not make it clear at the top of the issue that I intended this for the backlog. Definitely too late to do anything reasonable for DCAT 2. I just wanted to have a discrete issue to track. |
The Interest Group on Data Discovery Paradigms of the Research Data Alliance has a task force on Data/metadata granularity, which is considering a taxonomy of data aggregation. @andrea-perego and @dr-shorthair are in contact and will likely bring more detailed requirements that can form the basis of developing DCAT patterns for this. |
Sorry to be late with this comment, haven't been following this work. As an outsider to this WG can I reinforce the importance of this issue. This has been, and continues to be, a substantial pain point in our attempts to use dcat. Sad to hear that it won't be addressed for DCAT 2. In our experience with public sector datasets it is relatively rare for a dataset to be a unitary thing which be downloaded in its entirety. More typically the non-realtime datasets we see comprise a series of updates (annual, quarterly, monthly etc as determined by some release cycle). Where possible we provide data services and dumps for the whole series. However, both users and publishers want to explicitly see the series of updates as individual elements they can separately download but regard the collection of those updates as a single dataset with common metadata and want the data, and presentation of it, to reflect that. Possible approaches to this include:
Even if you can't recommend a specific pattern for DCAT 2 would you be able to give some indication of the likely direction of travel (as a guide to those of us who need to work around the limitation in the meantime)? |
@der My opinion is that, if we want to support dataset series in DCAT -- and I agree it is a very common request -- the best approach might be to define a new class (subclass of |
Thanks @makxdekkers. That would work for me. In our case we would likely want to treat each of our resources as an instance of both Do you think there's a chance of squeezing a non-normative indication of this as a possible future pattern into the doc? Or at least a comment that use of I mention this because https://www.w3.org/TR/vocab-dcat-2/#Property:catalog_has_part implies that you have domain/range declarations for |
@der I was just expressing my personal opinion, and the group might not agree. |
for future reference (v3?) I agree DatasetSeries should be a separate subclass of dcat:Resource. Noting that a series would have all the properties that are specific to dataset, from a modeling perspective it might be treated as a subclass of Dataset, with the addition of a mandatory(2..N) 'hasPart' relationship, and properties indicating how the 'granules' in the collection are defined (time, space...). |
Yes @smrgeoinfo that is my thinking as well. Richer treatment of relations between resources (esp. datasets) is one of the features that has been added in DCAT2, so we have the platform already. |
A very simple solution is to point to multiple resources by repating dcat:downloadURL from a single distribution. If needed additional properties like dct:title, dct:temporal, dct:spatial can be provided on these resources. We do it like this in EntryScape, allowing people to upload or point to multiple resources. |
@matthiaspalmer If you repeat |
@der said 'In our case we would likely want to treat each of our resources as an instance of both I would propose - if a new class is needed at all - that |
@dr-shorthair said:
GeoDCAT-AP uses the latter approach - see the section on resource types and related example: ## Resource type for series
[] a dcat:Dataset;
dct:type <http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series> . |
That depends on how these things are defined. The way I think about it is something like this: Time series: a group of datasets that are related along a time dimension, for example a dataset with the budget for 2019 and another dataset with the budget for 2020; so two datasets that contain the same type of data for a different time period Evolution: a single dataset that is updated 'in situ' over time with additional or modified data, for example a dataset with year-to-date expenditure data; so a single dataset that changes over time There are cases where you could model data either way; for example, in the case of YTD information, you could publish a snapshot every time it changes as a dataset with timestamp, or add additional data in the same dataset. It's up to the publisher to decide which one fits the needs of the users. I know a case where a YTD is updated in situ but then published as a snapshot every six months. |
Hm, typical usage in my own circles is that a time series is a dataset that has time as one variable within that one dataset. I would suggest avoiding using the term to talk about a series of datasets, to avoid confusion. |
Series v Evolution - just to give some support to this, library data recognizes series as: "Serial: Bibliographic item issued in successive parts bearing numerical or chronological designations and intended to be continued indefinitely. Includes periodicals; newspapers; annuals (reports, yearbooks, etc.); the journals, memoirs, proceedings, transactions, etc., of societies; and numbered monographic series, etc. " Basically, issued serially over time; a succession of parts or entries or files. "Integrating resource [kc: terrible name, but like Makx's "evolution"]: Bibliographic resource that is added to or changed by means of updates that do not remain discrete and are integrated into the whole. Examples include updating loose-leafs and updating Web sites. Integrating resources may be finite or continuing." I think "serial" and "updated" / "integrated" are pretty common patterns. The difficulty is in giving them clear names and definitions. And of course there will be some materials that are a bit of both, and I have no idea how to handle those in a user-friendly way. |
Discussion on this topic is also going on in the framework of DCAT-AP. The following posts provide a survey on how dataset series (and versions) are dealt with in DCAT-AP extensions: |
I have prepared a wiki page with a starting example depicting dataset time series. Do not hesitate to complete the page with other alternative examples and integrate the page If I have overlooked any of the above discussion's key points. I hope that having common examples to reason upon might help to stabilize a solution in the next DCAT call. |
Thanks, @riccardoAlbertoni . I've added some examples, and made a few editorial changes. |
This issue was automatically closed by the last PR merge. |
should this issue also be referenced in the Editors' note stating "The creation of a specific class for dataset series is under discussion." or should we rather open an specific issue for that discussion? |
I would suggest creating a new GitHub Issue, in which we can reprise the discussion quoting the views already expressed in the existing GitHub issue. |
@riccardoAlbertoni said:
I'm actually more in favour of closing this one, and creating a new issue once feedback will be submitted. |
In that case, should we get rid of the issue mentioned in the FWPD? or we plan to leave the mention of closed issues to provide context? |
I suggest we decide about this during our next call. |
A section about the Dataset series is included in the DCAT FPWD. |
A significant outstanding issue that keeps coming up as the sideline to other conversations [1][2] is the need to have a recommended pattern for cataloguing dataset series. Budget data, satellite imagery, ...
Usually most of the description (metadata) is the same, but the temporal coverage changes between members of a series.
[1] #789
[2] #806
The text was updated successfully, but these errors were encountered: