Skip to content

Make Dataverse produce valid DDI codebook 2.5 XML #3648

@jomtov

Description

@jomtov

Forwarded from the ticket:
https://help.hmdc.harvard.edu/Ticket/Display.html?id=245607


Hello,
I tried to validate two items exported to DDI from dataverse.harvard.edu with codebook.xsd (2.5) and got the same types of validation errors described below for item1 (below the line, should work as a well-formed xml-file):

Item 1:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BAMCSI

Item 2: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/P4JTOD

What could be done about it (else than meddling with the schema?)

Best regards,

Joakim Philipson
Research Data Analyst, Ph.D., MLIS
Stockholm University Library

Stockholm University
SE-106 91 Stockholm
Sweden

Tel: +46-8-16 29 50
Mobile: +46-72-1464702
E-mail: [email protected]
http://orcid.org/0000-0001-5699-994X

<docDscr>
<citation>
    <titlStmt>
        <titl>What’s in a name? : Sense and Reference in biodiversity information </titl>
        <IDNo agency="DOI">doi:10.7910/DVN/BAMCSI</IDNo>
    </titlStmt>
    <distStmt>
        <distrbtr>Harvard Dataverse</distrbtr>
        <distDate>2017-01-12</distDate>
    </distStmt>
    <verStmt source="DVN">
        <version date="2017-01-12" type="RELEASED">1</version>
    </verStmt>
    <biblCit>Philipson, Joakim, 2017, "What’s in a name? : Sense and Reference in
        biodiversity information", doi:10.7910/DVN/BAMCSI, Harvard Dataverse, V1</biblCit>
</citation>

<xs:attribute name="source" default="producer">
xs:simpleType
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="archive"/>
<xs:enumeration value="producer"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>

<stdyInfo>
    <subject>
        <keyword>Medicine, Health and Life Sciences</keyword>
        <keyword>Computer and Information Science</keyword>
        <keyword vocab="casrai" URI="http://dictionary.casrai.org/Metadata"
            >Metadata</keyword>
        <keyword vocab="casrai" URI="http://dictionary.casrai.org/PID_system">PID
            system</keyword>
        <keyword vocab="wikipedia" URI="https://en.wikipedia.org/wiki/Biodiversity"
            >Biodiversity</keyword>
        <keyword vocab="smw-rda" URI="http://smw-rda.esc.rzg.mpg.de/index.php/Taxonomy"
            >Taxonomy</keyword>
    </subject>
    <abstract>"That which we call a rose by any other name would smell as sweet.”
        Shakespeare has Juliet tell her Romeo that a name is just a convention without
        meaning, what counts is the reference, the 'thing itself', to which the property of
        smelling sweet pertains alone. Frege in his classical paper “Über Sinn und
        Bedeutung” was not so sure, he assumed names can be inherently meaningful, even
        without a known reference. And Wittgenstein later in Philosophical Investigations
        (PI) seems to deny the sheer arbitrariness of names and reject looking for meaning
        out of context, by pointing to our inability to just utter some random sounds and by
        that really implying e.g. the door. The word cannot simply be separated from its
        meaning, in the same way as the money from the cow that could be bought for them (PI
        120). Scientific names of biota, in particular, are often descriptive of properties
        pertaining to the organism or species itself. On the other hand, in semantic web
        technology and Linked Open Data (LOD) there is an overall effort to replace names by
        their references, in the form of web links or Uniform Resource Identifiers (URIs).
        “Things, not strings” is the motto. But, even in view of the many "challenges with
        using names to link digital biodiversity information" that were extensively
        described in a recent paper, would it at all be possible or even desirable to
        replace scientific names of biota with URIs? Or would it be sufficient to just
        identify equivalence relationships between different variants of names of the same
        biota, having the same reference, and then just link them to the same “thing”, by
        means of a property sameAs(URI)? The Global Names Architecture (GNA) has a resolver
        of scientific names that is already doing that kind of work, linking names of biota
        such as Pinus thunbergii to global identifiers and URIs from other data sources,
        such as Encyclopedia of Life (EOL) and uBio Namebank. But there may be other
        challenges with going from a “natural language”, even from a not entirely coherent
        system of scientific names, to a semantic web ontology, a solution to some of which
        have been proposed recently by means of so called 'lexical bridges'.</abstract>
    <sumDscr/>
    <contact affiliation="Stockholm University" email="[email protected]"
        >Philipson, Joakim</contact>
    <depositr>Philipson, Joakim</depositr>
    <depDate>2017-01-12</depDate>
</stdyInfo>    
<xs:complexType name="keywordType" mixed="true">
    <xs:complexContent>
        <xs:extension base="simpleTextType">
            <xs:attribute name="vocab" type="xs:string"/>
            <xs:attribute name="vocabURI" type="xs:string"/>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>
<sumDscr/>
<contact affiliation="Stockholm University" email="[email protected]"
    >Philipson, Joakim</contact>

<!-- In codebook: -->

<xs:complexType name="sumDscrType">
    <xs:complexContent>
        <xs:extension base="baseElementType">
            <xs:sequence>
                <xs:element ref="timePrd" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="collDate" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="nation" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geogCover" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geogUnit" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="geoBndBox" minOccurs="0"/>
                <xs:element ref="boundPoly" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="anlyUnit" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="universe" minOccurs="0" maxOccurs="unbounded"/>
                <xs:element ref="dataKind" minOccurs="0" maxOccurs="unbounded"/>
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

<xs:element name="sumDscr" type="sumDscrType">
    <xs:annotation>
        <xs:documentation>
            <xhtml:div>
                <xhtml:h1 class="element_title">Summary Data Description</xhtml:h1>
                <xhtml:div>
                    <xhtml:h2 class="section_header">Description</xhtml:h2>
                    <xhtml:div class="description">Information about the and geographic coverage of the study and unit of analysis.</xhtml:div>
                </xhtml:div>
            </xhtml:div>
        </xs:documentation>
    </xs:annotation>
</xs:element>
<useStmt>CC0 Waiver</useStmt>

dataverse_1062_philipsonErrorTypes.txt

Metadata

Metadata

Assignees

Labels

FY26 Sprint 4FY26 Sprint 4 (2025-08-13 - 2025-08-27)Feature: HarvestingFeature: MetadataNIH OTA: 1.2.12 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it...Size: 30A percentage of a sprint. 21 hours. (formerly size:33)Type: Buga defectUser Role: SysadminInstalls, upgrades, and configures the system, connects via sshpm.GREI-d-1.2.1NIH, yr1, aim2, task1: Design and implement integration with controlled voc

Type

No type

Projects

Status

No status

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions