-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
license and rights apply to Dataset as well as Distribution #104
Comments
perhaps "license and rights apply to Dataset not Distribution". The sense here being that the license etc. applies to the Dataset as delivered by the Distribution, not the Distribution itself. If a Distribution is anything other than a "Distribution", it is an action or method (of distributing the Dataset), not a thing, therefore can't have a license. |
There are examples today in the financial data exchange market of licences and rights/obligations varying according to mechanism and/or 'encoding' of a data set - so the licence would be different if the distribution pointed at a pdf file rather than an rdf or xml file even though the information is the same. [That's distinct from providing access by different means, if you see what I mean - which may again have different licence terms. This links to issue #55 about distribution services, I think. ] |
One of the objections against moving rights and licence 'upwards' from Distribution to Dataset is that there is no formal way to impose inheritance (as far as I know). If the properties are moved to Dataset (with the implicit meaning that the permissions and obligations apply to all Distributions), an application that receives DCAT metadata will need to look in two places: first look in the description of the Distribution, if nothing there, look in the description of the Dataset. Associating the information with the Dataset may gain some space (reducing duplication in descriptions of Distributions) but makes processing more complex. |
Just to note that it may be worth reviewing the rationale behind the choice made by the GLD WG. As far as I can see, the relevant issue is: PS: Actually, I think this review should be done programmatically for any revision to DCAT we are considering. |
I have reviewed the logic from the GLD WG - thanks Andrea! I note also the follow-up message from Makx that this was implemented in ADMS as well. I would like to argue this one out! I realise I'm strongly against precedent here (from DCAT and ADMS) but I think it makes far more sense for the license to apply to the Dataset and then either that licensing flows to Distributions as-is or perhaps they are able to implement additional restrictions (not relaxations). This to me seems to capture sense that the Dataset metadata is about the main item of concern and any Distribution metadata is secondary. As long as there are mechanisms to provide for the case identified by Dave above, this would be better! Licensing the encoding of a dataset is indeed different from licensing a dataset and, if that's a real scenario, it needs to be handled. |
@nicholascar in my mind, we need to be careful and not re-open discussions that were concluded in the past, unless there are strong reasons to do so. Strong reasons could be that many implementers report that the existing situation is unworkable, or that many implementations violate the rules. I am currently not aware of such complaints or large-scale violations; on the contrary, I know that there is a large installed base of DCAT applications that work with the current approach. So what do you see as the pressing need to make the change? |
This group probably doesn't want to enter in long discussion about what an inheritance of rights from Dataset to Distribution could be. There can be a lot of exceptions whereby a Distribution acquires new rights over the one of a Dataset - or it may even lose some. These may be justified or not - it depends on jurisdictions or the interpretations thereof. In the meantime I think it's required to keep the possibility - and encouragement - to have rights expressed at the Distribution level. That's not to say that I'm against rights at the Dataset level. I'd favor both in fact, so that implementers are not discouraged to handle whatever weird situation they encounter. I.e. they can assign various rights at the level(s) these rights are best expressed (and that's I think the original intention of this issue - but some later comments hinted that it wasn't optimal to have rights at Distribution level) |
@aisaac I would be very much against creating more flexibility on licensing. The current approach is very clear: metadata creators should put licensing information in the metadata of the Distribution; receivers of metadata will find the information there and nowhere else. |
There are two separate issues here - semantics and entailment. By making a statement that licences (and indeed other metadata such as data quality etc ) specified for Datasets apply to Distributions, unless overridden, (i.e. the standard object/method inheritance pattern) then it is possible to entail these as properties of a Distribution, and clients still only have to look there. This simply becomes a functional requirement for catalogs of DCAT records. It makes it far easier for publishers, and has greater informational content, than repeating properties for multiple distributions. (the semantics could be specified in the inverse direction - which is that if a Dataset has no such properties, then these are entailed from unique properties from the set of Distributions). |
@rob-metalinkage I am not saying we can't revisit thinking but in my opinion we should only do that if there are clear use cases that cannot be handled with the current approach. Maybe I am misreading this but it sounds to me that something that "simply becomes a functional requirement for catalogs of DCAT records" might mean that existing catalogues need to be upgraded if this constitutes a new requirement that didn't exist before. |
@makxdekkers I'm not suggesting more flexibility: in my view the possibility to add rights info on the Dataset level shouldn't replace the rule to put the required rights info on the Distribution. I was just supporting the option to put some info on the Dataset level. |
@aisaac I guess the only thing that I am arguing is that we need to have a strong reason to make such a change, and we need to have clear what people should be doing in specific situations. As I argued earlier, the message currently is very clear: licensing info goes into Distribution. If we also put licensing information in Dataset, implementers are surely going to ask "where should I put it?" and then someone needs to write a guidance document to describe situations in which you'd do one or the other, or both. If it is necessary, based on real use cases that cannot be handled in the current approach, then it is fine with me, but I still need to be convinced that this is really the case. I just want to be careful not to add unnecessary complexity. |
My two cents in this discussion,
There are a number of metadata standards that assign license and rights at
the dataset level. For example Project Open Data uses dct:license and
dct:rights at the Dataset level. (
https://project-open-data.cio.gov/v1.1/schema/). ISO 19115 standard
(including its serializations in ISO 19139 and ISO 19115-3) defines
resource constraints to capture legal constraints
(rights, accessRights) and security constraints.
In Geoplatform profile (which is based on OGC SRIM and DCAT), we allow
license and rights on the Dataset , so we can preserve the intended
semantics of the resource constraints of the ISO documents and POD
information that we import. We have extended the dct:RightsStatement with
dct:type to classify the type of statements (Trademark, Patent,...) modeled
as skos:Concept. We have also extended the model to accommodate security
constraints by introducing a new class sim:SecurityConstraints which deals
with classification and handling of information (however this may not be
relevant for Open Data). Most of the distributions have to deal with access
rights and licenses, and can overwrite the license/rights assigned at the
dataset level. Keeping these properties on distribution will make the model
backward compatible with DCAT 1.0, however, I truly believe we should allow
license and rights at the dataset level.
…--
Stephane Fellah
Chief Knowledge Scientist
Image Matters LLC
Office: +(703) 669 5510
Cell: 703 431 9420
|
@fellahst I would never argue that associating licence information with Distribution is the only valid way of doing it, and I am aware that other initiatives and standards take a different approach. My argument is that a decision was taken in the development of DCAT and changing that decision should be based, in my view, on a conclusion that the current approach does not work in all cases. I don't know if we already have reached that conclusion. |
I am not an expert on CKAN installations, but what I know from the implementation in Australia (https://data.gov.au/dataset) is that license can be attached at the distribution level and the dataset level and it does not have to be the same, i.e. when translated to RDF (which is done on the data.gov.au platform, e.g. https://data.gov.au/dataset/900143f6-6582-49c5-bfd4-0838901d99c8.rdf) you want to be able to do both. |
If we want inheritance, Dataset would need to become a subclass of Distribution. If global domain restrictions are removed, that may actually be a possibility, but maybe not desirable. |
@arminhaller Your comments actually strengthen me in my opinion that specifying |
@maxdekkers, if there is a conflict between licenses in the real world, we better allow them to be surfaced in the metadata level. Just because they are not modelled, doesn't mean they are not there. If datasets from different sources are packaged together in a Distribution, the fact that the datasets come with their license expressed in the metadata, will surface these incompatibilities and they can be actioned. |
@arminhaller I think there are two ways to handle conflicting licences. The first is, as you propose, to expose the conflict to the world. This lets the publisher off the hook and moves the problem to the consumer to sort out -- maybe by contacting the publisher to find out what the intention and legal situation is. The second approach could be to force the publisher to resolve the conflict during the publication process, i.e. making sure that in the published metadata there is no conflict. This puts the problem where I think it belongs, namely at the publisher. It is much easier for publishers to resolve the issue -- they would decide which of the conflicting licences applies to a particular distribution and put that in the metadata of the distribution; no need for the consumer to try and sort it out. I do not entirely understand the situation you describe where multiple datasets are packaged into a single Distribution -- this is a situation that I don't think is part of the current DCAT model. Is it? |
@makxdekkers - that would make the Catalog play a gatekeeper role - is that your intention? That there be a validation step prior to acceptance of a resource into the catalog, when the licensing would be checked? The FAIR community is looking at something similar: making the Repository the gatekeeper of dataset FAIRness. There would be a validation step before a dataset is accepted into the repo, with only FAIR data allowed in. |
Thanks, I meant to say packaged together in a Catalog. There is already that problem present that the license can be defined on the Distribution level and the Catalog level (and nothing prevents them to be different), but not on the Dataset level. I agree, the onus should be on the publisher to make sure there is a validation step involved that checks for incompatibility of licenses. This can be managed by external licensing ontologies and I think @nicholascar has been working on a schema for licenses already. |
@dr-shorthair Yes, it could be said that the (publisher of the) Catalog imposes rules on the description of the datasets and distributions in the Catalog. In practice, publishers of Catalogs often provide validation tools to the contributors of dataset descriptions to make sure that the contributed descriptions conform to the profile and also meet other quality requirements. |
@makxdekkers are there any constraints that prevent to use license and rights on the Dataset level? If not then I would just recommend to leave stuff as it is now in DCAT. I.e. recommend that there is a license/rights on Distribution and say nothing about Dataset. When this thread started I was in favour of recommending that they could be used at the Dataset level next to being used at the Distribution level - even if that would have potentially created some redundancy. Now that I see where the issue has lead the group in the course of the discussion over the past two weeks, I am changing my mind. I am not in favour of us trying to handle inference or conflicts between the different levels, or to try and tell publishers how to handle them. This would introduce too much complexity. |
Feedback from some people at the Flemish public administration ‘Agentschap Informatie Vlaanderen’: they also have a preference to keep recommending the use of the dcterms:license property on dcat:Distribution only and not on dcat:Dataset. Although differentiated licensing on the basis of the format / encoding of a distribution currently rarely occurs, there may be a need for this in the future. Similarly, dcterms:rights information may vary in function of differentiated quality-of-service, which also can be more easily related to a distribution than a dataset. |
However, from the discussion here ckan/ckanext-dcat#42 it looks like the DCAT is out of tune with CKAN and Open Data Hub, and that this is causing difficulties for these major metadata platforms. @aisaac you correctly point out that there is nothing prohibiting the use of @makxdekkers you asserted that there is "no formal way to impose inheritance" - well, in the OWA there is no formal way to impose much! But it is relatively easy to write rules for this kind of inheritance, which can be implemented in SPARQL and embedded in an RDF representation using SPIN. For example, I recently proposed something along these lines for part of an extension to SSN - see https://w3c.github.io/sdw/proposals/ssn-extensions/#container-property-rule Of course, running the other way is more tricky in the case where different licenses apply to different distributions of the same dataset, without some kind of license/rights algebra! |
Oh - if anyone here can improve my SPARQL, I'd be grateful ;-) |
I would too be against allowing licenses to be attached to Datasets. An example of where different licenses/rights would definitely come into play, if you need one, is if support for Web Service distributions comes into DCAT(-AP) (there already is an approach for this in StatDCAT-AP), and LOD datasets will have one distribution as an RDF dump (free to download) and one distribution in a SPARQL endpoint, where there could be some sort of web service contract described, such as 1 query per minute or so. In this case, any inheritance of licenses or rights from Dataset to Distributions would be confusing. Yes, CKAN is not fully compatible with DCAT, even with its DCAT extension, but that can be helped. What we use in Czechia is my simple extension to CKAN which allows publishers to specify licenses at the distribution level (https://github.com/jakubklimek/ckanext-odczdataset) . We also have that extension for DKAN, another popular data catalog (https://github.com/jakubklimek/dkanext-nkod) and we are working on a native DCAT(-AP) viewer (https://github.com/linkedpipes/dcat-ap-viewer), deployed e.g. here: https://nkod.opendata.cz, which also solves some of the other issues caused by trying to have the DCAT model loaded in CKAN, such as potentially high number of distributions per dataset. |
As far as I see it, there are two general cases:
It is fairly easy for a publisher to convert from case 1 to case 2, using an extension, like the one from @jakubklimek; in the other direction it is not so easy, as @dr-shorthair writes in his last sentence. |
OK - then one thing we should definitely do is to push back on CKAN, and explain the need for different licenses on different distributions. It is an easy argument to make. @jakubklimek gives a nice example above. If CKAN is OK with extending their model a little to meet that requirement, I still suggest we reciprocate, but more importantly meet the community request that has been expressed, and add some verbiage around the use of @nicholascar perhaps we could prepare a proposal, then the discussion could be more concrete. |
I'm happier with the way things are (licenses on Distribution, not Dataset) now that I've seen people's explanations as to what they are doing. Yes we need to include notes for use so others can understand this too. |
@dr-shorthair dixit:
Just to complement @jakubklimek 's point, it is worth mentioning that there are several examples of CKAN extensions addressing this issue (including the one we developed for the JRC Data Catalogue). A notable one is the European Data Portal (EDP) - the EDP extension is on GitLab (see the relevant line). There's therefore a clear requirement from (at least part of) the community of CKAN users.
On the CKAN implementation side, this could be harmlessly addressed in different ways - e.g., by revising the CKAN schema to support licences on both datasets and distributions, or by having a config parameter determining whether the licence is on the distribution or dataset level. This way, the CKAN DCAT extension could be aligned with DCAT without being in conflict with the (revised) CKAN schema. |
I am not in favour of linking a decision in this group to whether a software vendor -- even an important one like CKAN -- agrees to change its model. |
I agree with @makxdekkers . |
My comment about 'reciprocation' was not meant to imply any strict linkage, more that we should respect the fact that there are multiple alternative use-cases here, and a community of interest to play a part in. But I shouldn't have implied any mutual obligation or reciprocity. Too casual, sorry. Looking your proposal:
They are referred to in the first sentence in the Normative sections of the rec here https://w3c.github.io/dxwg/dcat/#class-catalog and here https://w3c.github.io/dxwg/dcat/#class-distribution - "are recommended for use on this class". According to https://tools.ietf.org/html/rfc2119 the word 'recommended' means the same as 'SHOULD', which means that it is required unless there is a good reason not to. There is no normative statement about the use of I see three alternatives:
I generally favour being more- rather than less- explicit. An issue has been spotted, we know that some parts of the community that we do respect and care about have hit it, we have had a discussion, and believe it is best to retain the status quo. But (unless we go with my option 2.) there remains a "grey" option to put a license or a rights statement on a dataset, so I think it behooves us to make it clear that we are aware of this option, and generally don't recommend it, and explain the risks with using it in that way. |
@dr-shorthair It seems to me that your option 3 is similar to my proposal in my comment above. An additional explanation is of course OK, as long as it clearly states that licences should be applied to distributions, and that use on datasets could be in addition to that but should not be an alternative. |
I believe @dr-shorthair came with option 3 because @makxdekkers 's option 2 says "not with datasets" and that can be read as "DCAT forbids the use of licensing/rights properties on dataset". I have understood it the same until I read @makxdekkers 's last comment. |
Here a draft text to be included in the Usage notes for dct:license and dct:rights: "Information about licences and rights SHOULD be provided on the level of Distribution. Information about licences and rights MAY be provided for a Dataset in addition to but not in stead of the information provided for the Distributions of that Dataset. Providing licence or rights information for a Dataset that is different from information provided for a Distribution of that Dataset should be avoided as this may create legal conflicts." |
Text for licences and rights as proposed in issue 104 #104 (comment)
This issue was resolved by #193 |
In DCAT 1.0 the dct properties license and rights are 'recommended for use' on dcat:Distribution, but not on dcat:Dataset. Suggest that licenses and rights can apply to datasets, and be inherited by distributions unless specifically overridden.
A meta-issue here is that DCAT v1 does not use a formal mechanism to associate properties from external vocabularies (primarily DC) with DCAT classes, just an informal 'recommendation'. Should we consider OWL Restrictions to provide constraints on the use of specific properties in the context of the DCAT classes. See also #103
The text was updated successfully, but these errors were encountered: