Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

license and rights apply to Dataset as well as Distribution #104

Closed
dr-shorthair opened this issue Feb 7, 2018 · 38 comments
Closed

license and rights apply to Dataset as well as Distribution #104

dr-shorthair opened this issue Feb 7, 2018 · 38 comments

Comments

@dr-shorthair
Copy link
Contributor

In DCAT 1.0 the dct properties license and rights are 'recommended for use' on dcat:Distribution, but not on dcat:Dataset. Suggest that licenses and rights can apply to datasets, and be inherited by distributions unless specifically overridden.

A meta-issue here is that DCAT v1 does not use a formal mechanism to associate properties from external vocabularies (primarily DC) with DCAT classes, just an informal 'recommendation'. Should we consider OWL Restrictions to provide constraints on the use of specific properties in the context of the DCAT classes. See also #103

@nicholascar
Copy link
Contributor

perhaps "license and rights apply to Dataset not Distribution". The sense here being that the license etc. applies to the Dataset as delivered by the Distribution, not the Distribution itself. If a Distribution is anything other than a "Distribution", it is an action or method (of distributing the Dataset), not a thing, therefore can't have a license.

@davebrowning
Copy link
Contributor

davebrowning commented Feb 7, 2018

There are examples today in the financial data exchange market of licences and rights/obligations varying according to mechanism and/or 'encoding' of a data set - so the licence would be different if the distribution pointed at a pdf file rather than an rdf or xml file even though the information is the same. [That's distinct from providing access by different means, if you see what I mean - which may again have different licence terms. This links to issue #55 about distribution services, I think. ]

@makxdekkers
Copy link
Contributor

One of the objections against moving rights and licence 'upwards' from Distribution to Dataset is that there is no formal way to impose inheritance (as far as I know). If the properties are moved to Dataset (with the implicit meaning that the permissions and obligations apply to all Distributions), an application that receives DCAT metadata will need to look in two places: first look in the description of the Distribution, if nothing there, look in the description of the Dataset. Associating the information with the Dataset may gain some space (reducing duplication in descriptions of Distributions) but makes processing more complex.

@andrea-perego
Copy link
Contributor

Just to note that it may be worth reviewing the rationale behind the choice made by the GLD WG. As far as I can see, the relevant issue is:

https://www.w3.org/2011/gld/track/issues/53

PS: Actually, I think this review should be done programmatically for any revision to DCAT we are considering.

@nicholascar
Copy link
Contributor

I have reviewed the logic from the GLD WG - thanks Andrea! I note also the follow-up message from Makx that this was implemented in ADMS as well.

I would like to argue this one out! I realise I'm strongly against precedent here (from DCAT and ADMS) but I think it makes far more sense for the license to apply to the Dataset and then either that licensing flows to Distributions as-is or perhaps they are able to implement additional restrictions (not relaxations). This to me seems to capture sense that the Dataset metadata is about the main item of concern and any Distribution metadata is secondary. As long as there are mechanisms to provide for the case identified by Dave above, this would be better! Licensing the encoding of a dataset is indeed different from licensing a dataset and, if that's a real scenario, it needs to be handled.

@makxdekkers
Copy link
Contributor

@nicholascar in my mind, we need to be careful and not re-open discussions that were concluded in the past, unless there are strong reasons to do so. Strong reasons could be that many implementers report that the existing situation is unworkable, or that many implementations violate the rules. I am currently not aware of such complaints or large-scale violations; on the contrary, I know that there is a large installed base of DCAT applications that work with the current approach. So what do you see as the pressing need to make the change?

@aisaac
Copy link
Contributor

aisaac commented Feb 14, 2018

This group probably doesn't want to enter in long discussion about what an inheritance of rights from Dataset to Distribution could be. There can be a lot of exceptions whereby a Distribution acquires new rights over the one of a Dataset - or it may even lose some. These may be justified or not - it depends on jurisdictions or the interpretations thereof. In the meantime I think it's required to keep the possibility - and encouragement - to have rights expressed at the Distribution level.

That's not to say that I'm against rights at the Dataset level. I'd favor both in fact, so that implementers are not discouraged to handle whatever weird situation they encounter. I.e. they can assign various rights at the level(s) these rights are best expressed (and that's I think the original intention of this issue - but some later comments hinted that it wasn't optimal to have rights at Distribution level)

@makxdekkers
Copy link
Contributor

@aisaac I would be very much against creating more flexibility on licensing. The current approach is very clear: metadata creators should put licensing information in the metadata of the Distribution; receivers of metadata will find the information there and nowhere else.
I feel very strongly that we shouldn't touch issues that were resolved unless there is a clear set of actual use cases that can't be handled by the current approach.

@rob-metalinkage
Copy link
Contributor

There are two separate issues here - semantics and entailment. By making a statement that licences (and indeed other metadata such as data quality etc ) specified for Datasets apply to Distributions, unless overridden, (i.e. the standard object/method inheritance pattern) then it is possible to entail these as properties of a Distribution, and clients still only have to look there. This simply becomes a functional requirement for catalogs of DCAT records. It makes it far easier for publishers, and has greater informational content, than repeating properties for multiple distributions. (the semantics could be specified in the inverse direction - which is that if a Dataset has no such properties, then these are entailed from unique properties from the set of Distributions).
The use cases that are obvious are around content negotiation, and particularly by profile. We are increasingly seeing data accessible by multiple API distributions rather than libraries of artefacts, so its possible the thinking needs to be revisited.

@makxdekkers
Copy link
Contributor

@rob-metalinkage I am not saying we can't revisit thinking but in my opinion we should only do that if there are clear use cases that cannot be handled with the current approach. Maybe I am misreading this but it sounds to me that something that "simply becomes a functional requirement for catalogs of DCAT records" might mean that existing catalogues need to be upgraded if this constitutes a new requirement that didn't exist before.
The use cases "around content negotiation" are not immediately obvious to me, so it might help if they could be made explicit.

@aisaac
Copy link
Contributor

aisaac commented Feb 14, 2018

@makxdekkers I'm not suggesting more flexibility: in my view the possibility to add rights info on the Dataset level shouldn't replace the rule to put the required rights info on the Distribution. I was just supporting the option to put some info on the Dataset level.

@makxdekkers
Copy link
Contributor

makxdekkers commented Feb 14, 2018

@aisaac I guess the only thing that I am arguing is that we need to have a strong reason to make such a change, and we need to have clear what people should be doing in specific situations. As I argued earlier, the message currently is very clear: licensing info goes into Distribution. If we also put licensing information in Dataset, implementers are surely going to ask "where should I put it?" and then someone needs to write a guidance document to describe situations in which you'd do one or the other, or both. If it is necessary, based on real use cases that cannot be handled in the current approach, then it is fine with me, but I still need to be convinced that this is really the case. I just want to be careful not to add unnecessary complexity.

@fellahst
Copy link

fellahst commented Feb 14, 2018 via email

@makxdekkers
Copy link
Contributor

@fellahst I would never argue that associating licence information with Distribution is the only valid way of doing it, and I am aware that other initiatives and standards take a different approach. My argument is that a decision was taken in the development of DCAT and changing that decision should be based, in my view, on a conclusion that the current approach does not work in all cases. I don't know if we already have reached that conclusion.
Other than that, I think that allowing licence information in two places and then needing to specify and implement rules for entailment and overwriting makes DCAT more complex, at least for the 'simple' implementations that don't use inference engines. And added complexity leads to additional need for guidelines and risk of multiple interpretations that may hurt interoperability.
Finally, with licences there is a legal dimension to consider. If there is a rule on the technical level that licence on distribution overwrites the licence on the dataset, does that hold legally? For example, if I encounter a CC-BY dataset where the distribution that I am interested in specifies CC-BY-NC, can I argue in court that I am not in violation if I use it for commercial purposes? I've been told that in case of conflicts, one may assume the most permissive licence takes precedence. but I am not a lawyer so I don't know if this is true. In any case, we need to be careful with this.

@dr-shorthair dr-shorthair changed the title license and rights apply to Dataset not only Distribution license and rights apply to Dataset as well as Distribution Feb 19, 2018
@arminhaller
Copy link

I am not an expert on CKAN installations, but what I know from the implementation in Australia (https://data.gov.au/dataset) is that license can be attached at the distribution level and the dataset level and it does not have to be the same, i.e. when translated to RDF (which is done on the data.gov.au platform, e.g. https://data.gov.au/dataset/900143f6-6582-49c5-bfd4-0838901d99c8.rdf) you want to be able to do both.

@arminhaller
Copy link

If we want inheritance, Dataset would need to become a subclass of Distribution. If global domain restrictions are removed, that may actually be a possibility, but maybe not desirable.

@makxdekkers
Copy link
Contributor

@arminhaller Your comments actually strengthen me in my opinion that specifying dct:license for dcat:Dataset in addition to dcat:Distribution is not a good idea. As you point out, there could be different licences on the two levels, which may create a legal conflict. A need to declare Dataset to be a subclass of Distribution sounds like making things even worse. Let's not go there.

@dr-shorthair dr-shorthair added this to the Licenses and rights milestone Mar 16, 2018
@arminhaller
Copy link

@maxdekkers, if there is a conflict between licenses in the real world, we better allow them to be surfaced in the metadata level. Just because they are not modelled, doesn't mean they are not there. If datasets from different sources are packaged together in a Distribution, the fact that the datasets come with their license expressed in the metadata, will surface these incompatibilities and they can be actioned.

@makxdekkers
Copy link
Contributor

@arminhaller I think there are two ways to handle conflicting licences.

The first is, as you propose, to expose the conflict to the world. This lets the publisher off the hook and moves the problem to the consumer to sort out -- maybe by contacting the publisher to find out what the intention and legal situation is.

The second approach could be to force the publisher to resolve the conflict during the publication process, i.e. making sure that in the published metadata there is no conflict. This puts the problem where I think it belongs, namely at the publisher. It is much easier for publishers to resolve the issue -- they would decide which of the conflicting licences applies to a particular distribution and put that in the metadata of the distribution; no need for the consumer to try and sort it out.

I do not entirely understand the situation you describe where multiple datasets are packaged into a single Distribution -- this is a situation that I don't think is part of the current DCAT model. Is it?

@dr-shorthair
Copy link
Contributor Author

@makxdekkers - that would make the Catalog play a gatekeeper role - is that your intention? That there be a validation step prior to acceptance of a resource into the catalog, when the licensing would be checked?

The FAIR community is looking at something similar: making the Repository the gatekeeper of dataset FAIRness. There would be a validation step before a dataset is accepted into the repo, with only FAIR data allowed in.

@arminhaller
Copy link

Thanks, I meant to say packaged together in a Catalog. There is already that problem present that the license can be defined on the Distribution level and the Catalog level (and nothing prevents them to be different), but not on the Dataset level. I agree, the onus should be on the publisher to make sure there is a validation step involved that checks for incompatibility of licenses. This can be managed by external licensing ontologies and I think @nicholascar has been working on a schema for licenses already.

@makxdekkers
Copy link
Contributor

@dr-shorthair Yes, it could be said that the (publisher of the) Catalog imposes rules on the description of the datasets and distributions in the Catalog. In practice, publishers of Catalogs often provide validation tools to the contributors of dataset descriptions to make sure that the contributed descriptions conform to the profile and also meet other quality requirements.

@aisaac
Copy link
Contributor

aisaac commented Mar 20, 2018

@makxdekkers are there any constraints that prevent to use license and rights on the Dataset level? If not then I would just recommend to leave stuff as it is now in DCAT. I.e. recommend that there is a license/rights on Distribution and say nothing about Dataset.

When this thread started I was in favour of recommending that they could be used at the Dataset level next to being used at the Distribution level - even if that would have potentially created some redundancy.

Now that I see where the issue has lead the group in the course of the discussion over the past two weeks, I am changing my mind. I am not in favour of us trying to handle inference or conflicts between the different levels, or to try and tell publishers how to handle them. This would introduce too much complexity.

@stijngoedertier
Copy link

Feedback from some people at the Flemish public administration ‘Agentschap Informatie Vlaanderen’: they also have a preference to keep recommending the use of the dcterms:license property on dcat:Distribution only and not on dcat:Dataset. Although differentiated licensing on the basis of the format / encoding of a distribution currently rarely occurs, there may be a need for this in the future. Similarly, dcterms:rights information may vary in function of differentiated quality-of-service, which also can be more easily related to a distribution than a dataset.

@dr-shorthair
Copy link
Contributor Author

However, from the discussion here ckan/ckanext-dcat#42 it looks like the DCAT is out of tune with CKAN and Open Data Hub, and that this is causing difficulties for these major metadata platforms.

@aisaac you correctly point out that there is nothing prohibiting the use of dct:license and dct:rights on a dcat:Dataset, but clearly from the discussion in these user communities there is uncertainty about how this works, and probably a reluctance to use properties out of the context described in the recommendation. To resolve this I don't think leaving the documentation as is would be satisfactory. Can we allow for use of licensing and rights information at both the Dataset and Distribution level, and then leave it up to profiles or repositories to recommend their best practice?

@makxdekkers you asserted that there is "no formal way to impose inheritance" - well, in the OWA there is no formal way to impose much! But it is relatively easy to write rules for this kind of inheritance, which can be implemented in SPARQL and embedded in an RDF representation using SPIN. For example, I recently proposed something along these lines for part of an extension to SSN - see https://w3c.github.io/sdw/proposals/ssn-extensions/#container-property-rule
The presumption there is that the local value will always override the value 'inherited' from a container.
The same pattern would apply on a dcat:Distribution inheriting values from the associated dcat:Dataset.

Of course, running the other way is more tricky in the case where different licenses apply to different distributions of the same dataset, without some kind of license/rights algebra!

@dr-shorthair
Copy link
Contributor Author

Oh - if anyone here can improve my SPARQL, I'd be grateful ;-)

@jakubklimek
Copy link
Contributor

I would too be against allowing licenses to be attached to Datasets.
There is nothing stopping the publishers to link to the same description from all the distributions of their datasets, but there are definitely cases where each distribution needs its own license/rights description.

An example of where different licenses/rights would definitely come into play, if you need one, is if support for Web Service distributions comes into DCAT(-AP) (there already is an approach for this in StatDCAT-AP), and LOD datasets will have one distribution as an RDF dump (free to download) and one distribution in a SPARQL endpoint, where there could be some sort of web service contract described, such as 1 query per minute or so. In this case, any inheritance of licenses or rights from Dataset to Distributions would be confusing.

Yes, CKAN is not fully compatible with DCAT, even with its DCAT extension, but that can be helped. What we use in Czechia is my simple extension to CKAN which allows publishers to specify licenses at the distribution level (https://github.com/jakubklimek/ckanext-odczdataset) . We also have that extension for DKAN, another popular data catalog (https://github.com/jakubklimek/dkanext-nkod) and we are working on a native DCAT(-AP) viewer (https://github.com/linkedpipes/dcat-ap-viewer), deployed e.g. here: https://nkod.opendata.cz, which also solves some of the other issues caused by trying to have the DCAT model loaded in CKAN, such as potentially high number of distributions per dataset.

@makxdekkers
Copy link
Contributor

As far as I see it, there are two general cases:

  1. all distributions of a dataset have the same licence. In that case, it is indeed easier to assign the licence to the dataset. This is how CKAN works; it does not allow licences to be different for different distributions -- I guess a CKAN user would be forced to create a separate dataset for a distribution that has a different licence?
  2. not all distributions have necessarily the same licence. In that case, it makes more sense to assign licences to distributions.

It is fairly easy for a publisher to convert from case 1 to case 2, using an extension, like the one from @jakubklimek; in the other direction it is not so easy, as @dr-shorthair writes in his last sentence.
However, my main argument is against doing both -- allowing licences both on dataset and on distribution -- because that is really the worst of both worlds.

@dr-shorthair
Copy link
Contributor Author

dr-shorthair commented Mar 28, 2018

OK - then one thing we should definitely do is to push back on CKAN, and explain the need for different licenses on different distributions. It is an easy argument to make. @jakubklimek gives a nice example above.

If CKAN is OK with extending their model a little to meet that requirement, I still suggest we reciprocate, but more importantly meet the community request that has been expressed, and add some verbiage around the use of dct:license and dct:rights (and other conditions-of-use properties) in the context of dcat:Dataset, fenced with warnings about potential interactions with additional or alternative conditions applied at the dcat:Distribution level.

@nicholascar perhaps we could prepare a proposal, then the discussion could be more concrete.

@nicholascar
Copy link
Contributor

I'm happier with the way things are (licenses on Distribution, not Dataset) now that I've seen people's explanations as to what they are doing. Yes we need to include notes for use so others can understand this too.

@andrea-perego
Copy link
Contributor

@dr-shorthair dixit:

OK - then one thing we should definitely do is to push back on CKAN, and explain the need for different licenses on different distributions. It is an easy argument to make. @jakubklimek gives a nice example above.

Just to complement @jakubklimek 's point, it is worth mentioning that there are several examples of CKAN extensions addressing this issue (including the one we developed for the JRC Data Catalogue). A notable one is the European Data Portal (EDP) - the EDP extension is on GitLab (see the relevant line).

There's therefore a clear requirement from (at least part of) the community of CKAN users.

If CKAN is OK with extending their model a little to meet that requirement, I still suggest we reciprocate, but more importantly meet the community request that has been expressed, and add some verbiage around the use of dct:license and dct:rights (and other conditions-of-use properties) in the context of dcat:Dataset, fenced with warnings about potential interactions with additional or alternative conditions applied at the dcat:Distribution level.

On the CKAN implementation side, this could be harmlessly addressed in different ways - e.g., by revising the CKAN schema to support licences on both datasets and distributions, or by having a config parameter determining whether the licence is on the distribution or dataset level. This way, the CKAN DCAT extension could be aligned with DCAT without being in conflict with the (revised) CKAN schema.

@makxdekkers
Copy link
Contributor

I am not in favour of linking a decision in this group to whether a software vendor -- even an important one like CKAN -- agrees to change its model.
In my mind, we seem to have established that the DCAT model in this case is more flexible that the CKAN model. There is a clear way -- and there are already tools -- for CKAN data to be converted to the DCAT model, while converting in the other direction is more complicated.
Could we try to wrap this up?
My proposal is:
(a) to make no changes to the DCAT model;
(b) to add a Usage note to https://www.w3.org/TR/vocab-dcat/#Property:distribution_license to explain why DCAT recommends licensing to be associated with Distributions and not with Datasets.

@andrea-perego
Copy link
Contributor

I agree with @makxdekkers .

@dr-shorthair
Copy link
Contributor Author

My comment about 'reciprocation' was not meant to imply any strict linkage, more that we should respect the fact that there are multiple alternative use-cases here, and a community of interest to play a part in. But I shouldn't have implied any mutual obligation or reciprocity. Too casual, sorry.

Looking your proposal:

dct:license and dct:rights are not bound into DCAT axiomatically.

They are referred to in the first sentence in the Normative sections of the rec here https://w3c.github.io/dxwg/dcat/#class-catalog and here https://w3c.github.io/dxwg/dcat/#class-distribution - "are recommended for use on this class". According to https://tools.ietf.org/html/rfc2119 the word 'recommended' means the same as 'SHOULD', which means that it is required unless there is a good reason not to.

There is no normative statement about the use of dct:license and dct:rights in the context of dcat:Dataset, and there are no rdfs:domain or owl:Restriction axioms to affect the usage one way or the other. So the RDFS does not disallow licenses and rights statements on dcat:Dataset, and the rec document is silent on the issue.

I see three alternatives:

  1. change nothing. Anyone who cares to could put a licenses or rights statements on a dataset, but it is a grey area;
  2. specifically prohibit licenses and rights statements, either in normative text, or axiomatically with an owl:Restriction that fixes cardinality=0 for these properties on datasets
  3. add some language explaining why it is usual to apply licenses and rights statements to distributions, and pointing out the implications for their use on datasets.

I generally favour being more- rather than less- explicit. An issue has been spotted, we know that some parts of the community that we do respect and care about have hit it, we have had a discussion, and believe it is best to retain the status quo. But (unless we go with my option 2.) there remains a "grey" option to put a license or a rights statement on a dataset, so I think it behooves us to make it clear that we are aware of this option, and generally don't recommend it, and explain the risks with using it in that way.

@makxdekkers
Copy link
Contributor

@dr-shorthair It seems to me that your option 3 is similar to my proposal in my comment above. An additional explanation is of course OK, as long as it clearly states that licences should be applied to distributions, and that use on datasets could be in addition to that but should not be an alternative.

@aisaac
Copy link
Contributor

aisaac commented Apr 3, 2018

I believe @dr-shorthair came with option 3 because @makxdekkers 's option 2 says "not with datasets" and that can be read as "DCAT forbids the use of licensing/rights properties on dataset". I have understood it the same until I read @makxdekkers 's last comment.
So with a clarification as Makx suggests ("use on datasets could be in addition to that but should not be an alternative") I think it is alright!

@makxdekkers
Copy link
Contributor

makxdekkers commented Apr 3, 2018

Here a draft text to be included in the Usage notes for dct:license and dct:rights:

"Information about licences and rights SHOULD be provided on the level of Distribution. Information about licences and rights MAY be provided for a Dataset in addition to but not in stead of the information provided for the Distributions of that Dataset. Providing licence or rights information for a Dataset that is different from information provided for a Distribution of that Dataset should be avoided as this may create legal conflicts."

makxdekkers added a commit that referenced this issue Apr 6, 2018
Text for licences and rights as proposed in issue 104  #104 (comment)
@dr-shorthair dr-shorthair removed this from the Licenses and rights milestone Jul 19, 2018
@dr-shorthair
Copy link
Contributor Author

This issue was resolved by #193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests