-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provenance information [RPIF] #76
Comments
About agent roles, I have a punctual proposal which is about relaxing the domain of This issue popped up during the development of GeoDCAT-AP, since all the agent roles ( Besides this, the "contact point" is probably the most important role for data consumers, not only for datasets. For instance, for a I wonder whether this (and similar revisions to DCAT) requires the creation of a separate use case.
|
Re-represented as RDA Prov Patterns WG Use Case 41: http://patterns.promsns.org/usecase/41 |
Several patterns for "providing a way to link to structured information about the provenance of a dataset" are given in both PROV an in patterns by the RDA Prov Patterns WG, such as http://patterns.promsns.org/pattern/12. We should reuse these. |
I propose to untag "quality" from this issue, as this issue is more related to provenance than quality. Clearly "provenance" might influence quality but considered that we have the tag "provenance", I think we can remove "quality". |
A placeholder section/sub-section or proposal for the DCAT document would be appreciated - to alert the community when we release the FPWD. |
Regarding the first of the three items in the description of this requirement: ProposalAssuming an established Pattern 1: store provenance in a different document/service to the Dataset metadata and link with either This is appropriate when potentially detailed provenance information cannot be well catered for within the standard DCAT document. This will be the case in purpose-built systems that cater for DCAT but not all the possibilities of PROV, even for Dataset/Dataset (Entity/Entity) relations. Example: Dataset X was derived from Dataset Y and Dataset Z: Within the DCAT document:
Instead of a provenance document, a dataset could be linked to a provenance query service using Pattern 2: link datasets directly to others with PROV-O relations Example: Dataset X was derived from Dataset Y and Dataset Z: Within the DCAT document:
or, qualified forms (see https://www.w3.org/TR/prov-o/#qualifiedDerivation):
|
Regarding the second of the three items in the description of this requirement: ProposalInterpret software as a specialised form of Example: Dateset X was derived from Dataset Y and the derivation was made using Software Z As long the specific instance of software that was used can be recorded (i.e. not the URI of the GitHub repo but of the specific commit that was used) then the above can simply be recorded as:
where the derivation from Software Z is understood to be a derivation by instruction due to
|
Regarding the third of the three items in the description of this requirement: ProposalFor the general case of role or other qualifications, see Qualified forms [RQF] #79 where a proposal for qualified forms is made with agent roles as an example. |
@nicholascar wrote:
I don't know if that's possible: Usually software is considered a |
@nicholascar , some time ago I added examples of provenance patterns in the wiki, and some of them relate to yours: Would you mind having a check, and see if you think they should be revised/extended? |
This gets messier with things like Shacl and Spin where the software is data. Software is an entity, an instance of running software is an agent? This fits with software being subject to processes such as automated testing. |
@rob-metalinkage Can you expound on this a bit? "This gets messier with things like Shacl and Spin where the software is data" Which software is data? I read Shacl as taking instance data as input, so I'm not sure which software you mean. But I may be thinking of something other than what you meant. |
@rob-metalinkage @larsgsvensson we have long-used precedence with instances of software being I have run this pattern of the instance of software used being modelled as a The pattern is generalisable to include methods other than software, such as scientific methods. I will re-document that pattern for the RDA WG in more detail shortly. |
@nicholascar It seems that we need to define exactly what we mean by software... I'd say that we need to differentiate between the sequence of commands being executed (aka a programme), the execution of that programme (aka a process) and any input passed to that process (let's call it input). If we look at the case of a SHACL engine validating a piece of RDF using a SHACL file it seems to me that the execution of the validation is a If that's what you mean, I fully agree. And we need a better word than "software". |
@larsgsvensson the differentiation you describe is how I describe things so I agree with your general characterisations. I do agree that defining a process can be tricky but if we stick to the "provenance that we want", not a "provenance that could be modelled" then we can usually do something sensible. In the example you give of a servlet validating something I would model it thus:
So this modelling will allow someone to see when ( |
Could someone clarify the relevance of |
Provenance information should probably be available at the level of representations ( (Does this need a new Issue?) |
I think we should consider here cases where "provenance" is expressed in a discursive way - e.g., when describing the dataset lineage (as mentioned in UC9). This is quite a common practice for scientific data and in some domains, as the geospatial one. In most cases, these lineage descriptions are such that they cannot be easily converted into a machine-actionable representation. In DCAT-AP, this is done by using It may be worth considering its inclusion in DCAT. |
As there has been no further discussion on this issue, I propose to close it. |
Noting no objections, I am closing this issue. |
Provenance information [RPIF]
Provide a way to link to structured information about the provenance of a dataset including:
dct:creator, dct:publisher etc are special cases, which require guidance, further roles may be defined in provenance or other richer models. The requirement is to establish an extensible mechanism, and for profiles to specify canonical equivalents for the special case properties of dcat:Dataset
Related use cases: Common requirements for scientific data [ID9] Modeling data lineage [ID12] Modeling agent roles [ID13] Modeling funding sources [ID31]
The text was updated successfully, but these errors were encountered: