Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible relevance of FRBR for versioning #1251

Closed
andrea-perego opened this issue Sep 16, 2020 · 18 comments · Fixed by #1257
Closed

Possible relevance of FRBR for versioning #1251

andrea-perego opened this issue Sep 16, 2020 · 18 comments · Fixed by #1257

Comments

@andrea-perego
Copy link
Contributor

The DCAT subgroup has created a synoptic table for vocabularies possibly relevant to versioning:

https://github.com/w3c/dxwg/wiki/Material-for-a-SPRINT-on-Versioning#11-summary-table

We also included FRBR in that table, but our analysis is incomplete, and we would like to kindly ask WG members who are more familiar with FRBR to review it, and let us know if anything is wrong and/or missing.

@kcoyle
Copy link
Contributor

kcoyle commented Sep 17, 2020

Although not directly useful, perhaps, to this issue, I have been working on a number of uses of FRBR that are outside of the library context. There is a very short intro to this in my blog post. I could look at the DCAT use with that eye, which means extrapolating the semantics of WEMI but super-classed to the library semantics, and possibly sub-classed to a more general set of classes that represent created information. There are some interesting examples of uses of subclasses of FRBR/WEMI which may be relevant as models.

I haven't yet thought about whether datasets fit within the library definitions of FRBR or not. That would be step 1.

My main question for the table in your file is whether the versions are of the datasets, the metadata, or both. ? Thanks.

@andrea-perego
Copy link
Contributor Author

Thanks, @kcoyle .

The versioning aspects in the table are not scoped to a specific type of resource. Basically, what we did so far was to review existing vocabularies, and find out properties / classes that may be related to versioning - irrespective of the type of resource they apply to.

About FRBR, the terms I found out which might be relevant concern the notions of "translation", "alternate", and "revision". So, a first question I have is whether you think they are indeed related to "versioning" and/or whether there's anything else in FRBR that can be relevant.

@kcoyle
Copy link
Contributor

kcoyle commented Sep 17, 2020

Andrea, many of the relationships in FRBR express some type of "version." The problem with the FRBR properties is that they are all clearly defined as to domain and range, which means that they pertain to entities that have been defined as one of the FRBR WEMI classes, and the classes are defined as disjoint. Thus, "translation" has a domain of frbr:Expression and a range of frbr:Expression
Screen Shot 2020-09-17 at 1 14 30 PM
and "alternate" has a domain of frbr:Manifestation and a range of frbr:Manifestation
Screen Shot 2020-09-17 at 2 59 29 PM
Each of the classes in FRBR is also defined as disjoint:
Screen Shot 2020-09-17 at 1 16 16 PM
You would need to make sure that you do not "cross the streams" - that is, mix the WEMI classes when using FRBR relationship properties in a way that would violate the disjointness. Within the ones you have listed (and there could be others that are useful, it seems to me), translation can be used only with Expressions and alternate only with Manifestations. This reflects the very strict concepts of WEMI for library data, which I doubt is how anyone outside of library catalog creation can operate.

It is too bad that the properties, which could be useful in other contexts, are so constrained in their definitions. It very much limits the reusability of FRBR properties. I don't know how everyone feels about "going rogue" and possibly ignoring domains, ranges and disjoint declarations, but that might be what would be needed.

I would be willing to give a general overview of possibly useful properties for versions, but ignoring for the moment the question of compatibility with the FRBR vocabulary.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Sep 18, 2020

I'd be inclined to look at the FRBR conceptual model primarily for guidance, and not necessarily expect to use the implementation in the namespace http://purl.org/vocab/frbr/core#

@andrea-perego
Copy link
Contributor Author

I would be willing to give a general overview of possibly useful properties for versions, but ignoring for the moment the question of compatibility with the FRBR vocabulary.

Thanks, @kcoyle . This would be very helpful.

@kcoyle
Copy link
Contributor

kcoyle commented Sep 18, 2020

The bibliographic version types found in FRBR (and elsewhere in bibliographic metadata), by type:

note: modified description of successor

  1. versions that replace: has revision, is revision of (modified contents)
  2. versions that are time-based (may or may not replace) successor to; has successor;
  3. derived versions that are in addition to but do not replace: summary of (a reduction); adaptation of (based on); transformation of (to a different expressive form, e.g. book to movie, but possibly could be interpreted as transformation to a different measurement system); translation (generally used with language works);

There are two other types of relationships which are harder to think of as versions but which are common relationships between things being described:

  1. Accompanying materials: supplement; appendix; index; concordance; etc.
  2. Whole/part

@andrea-perego
Copy link
Contributor Author

Thanks a lot, @kcoyle . I think we should use this classification in DCAT to clarify the possible different version types, and the corresponding properties. IMO, it would be beneficial to guide users in understanding which of them are applicable to their use cases.

I started revising the table in the wiki accordingly, realising I misplaced some of the properties.

I have another question: I saw that FRBR includes also the notions of "arrangement" and "reconfiguration", but I am not sure how they fit into this classification.

@kcoyle
Copy link
Contributor

kcoyle commented Sep 19, 2020

@andrea-perego Arrangement is specific to music, and reconfiguration is specific to serials. Arrangement is a kind of 3 but only for music; reconfiguration has to do with how paper serials are bound for storage, so I think you can ignore that one.

@andrea-perego
Copy link
Contributor Author

andrea-perego commented Sep 19, 2020

@agreiner 's reply in https://lists.w3.org/Archives/Public/public-dxwg-wg/2020Sep/0037.html:

Maybe bound for storage = collated into files?

On 9/19/20 10:22 AM, Karen Coyle via GitHub wrote:

@andrea-perego Arrangement is specific to music, and reconfiguration is specific to serials. Arrangement is a kind of 3 but only for music; reconfiguration has to do with how paper serials are bound for storage, so I think you can ignore that one.

@kcoyle 's reply in https://lists.w3.org/Archives/Public/public-dxwg-wg/2020Sep/0038.html:

Could be - here's all I know:

"For serials, reconfiguration happens when several unbound copies representing different issues are bound together to make a single new item."

kc

On 9/19/20 10:30 AM, Annette Greiner wrote:

Maybe bound for storage = collated into files?

On 9/19/20 10:22 AM, Karen Coyle via GitHub wrote:

[...]

@andrea-perego
Copy link
Contributor Author

Thanks, @kcoyle .

Coming back to the 3 version types, I wonder whether, in the context of DCAT (and, therefore, in catalogues and registries), versioning would better be limited to types 1 & 2, whereas type 3 would rather correspond to the notion of derivative work - and so it should be addressed differently from versioning in DCAT (and we have already some example for this in the spec).

@kcoyle
Copy link
Contributor

kcoyle commented Sep 22, 2020

@andrea-perego You might have seen that in FRBR none of these are called "versions" - they are relationships between different "things" in that bibliographic universe. I think that the term "version" is quite vague and easily misunderstood. Perhaps the best exercise would be to look at the kinds of relationships that exist in the areas where DCAT is used, and from those form a (informal) taxonomy of those relationships, rather than beginning with a less well shared notion of "version". You may actually discover relationships that are more varied than the more naive sense of "version."

@kcoyle
Copy link
Contributor

kcoyle commented Sep 22, 2020

This is a bit far afield of what I think DCAT versioning is intended to be, but it is an example of defining derivative relationships between datasets. The citation is:

McCusker, J. P., Lebo, T., Chang, C., McGuinness, D. L., & da Silva, P. P. (2012). Parallel identities for managing open government data. IEEE Intelligent Systems, 27(3), 55. http://tw.rpi.edu/media/2012/02/07/d641/EX_ISSI-2011-09-0138.R1_McCusker.pdf

The diagram of relationships (in this case defined by Work, Expression, Manifestation, Item - and I'm not suggesting you should use those entities in your work) is:

frirDiagram

But the analysis is interesting. I was describing it briefly in an article as: "In this model, the use of abstraction, expressed information, and physical sources is needed to allow members of the community to combine data from different sources and to determine at what level the sources represent the same information. The authors apply the four WEMI levels (which they refer to as the four FRBR levels of abstraction) to digital information resources. They consider exact copies of files (the same bitstream) to be items of the same manifestation. Different file formats with the same content, such as a CSV file and an Excel file, are different manifestations of the same expression. Different expressions contain the same informational content but the files may differ in having more or less content, yet they are expressions of the same work because they express the same basic data."

Given the structure of DCAT, relationships could occur at different levels of abstraction (resource, dataset, distribution). The question is: how does that affect defining relationships for DCAT? Are there different kinds of versions at different levels? Is it possible to create a small set of easily determined relationships that are the most useful ones?

Sorry for the length and especially if you've already covered this.

@dr-shorthair
Copy link
Contributor

I think that the term "version" is quite vague and easily misunderstood.

Indeed. The RDA study is fundamentally undermined by conflating many relationships into 'version'.

@riccardoAlbertoni
Copy link
Contributor

I think that the term "version" is quite vague and easily misunderstood.

Indeed. The RDA study is fundamentally undermined by conflating many relationships into 'version'.

@dr-shorthair: Are you suggesting to take a look at the following document or are there other sources I might have overlooked?

Klump, J., Wyborn, L., Downs, R., Asmi, A., Wu, M., Ryder, G., & Martin, J. (2020). Principles and best practices in data versioning for all data sets big and small. Version 1.1. Research Data Alliance. DOI: 10.15497/RDA00042

@dr-shorthair
Copy link
Contributor

Yes - that's the one. It has also been converted into an article submitted to DSJ (which I did a review for). But the essence comes from the RDA study.

@kcoyle
Copy link
Contributor

kcoyle commented Oct 2, 2020

I had hopes for their analysis when they went with "Revision, Release, Granularity..." but I think they lost the thread somewhere along the way, mixing versions with things like Manifestation and Provenance. That's too bad. There was potential.

@andrea-perego andrea-perego linked a pull request Oct 15, 2020 that will close this issue
@andrea-perego
Copy link
Contributor Author

Following this discussion thread, I have prepared with @riccardoAlbertoni an extended draft of the "Versioning" section:

https://w3c.github.io/dxwg/dcat/#dataset-versions

Although this draft does not address some of the last comments - in particular, @kcoyle 's and @dr-shorthair 's points on the risk of conflating into the notion of "version" different relationship types - the DCAT subgroup has eventually decided to merge it into the ED to have something tangible for discussion.

Looking forward to your comments.

PS: I think it would be good to bring into the discussion @makxdekkers 's point in #868 (comment)

@andrea-perego
Copy link
Contributor Author

Closing this issue, as decided in the 11 Nov 2020 DCAT call and following the extension of the versioning section in the ED - see #1251 (comment).

New issues should be created for further discussion on this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants