Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2001, Lecture Notes in Computer Science
Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the web as semi-structured data, other information providers are developing repositories for new application. One such application is e-commerce, which is emerging as a major web-supported application assisting business transactions between multiple parties via the network and involving large amounts of data. Designing a \good" semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we propose a conceptual approach to design semi-structured databases. A conceptual layer which is based on the popular Entity-Relationship (ER) model is employed to remove anomalies and redundancies at the semantic level. An algorithm to map an ER diagram involving composite attributes weak entity types, recursive, n-ary and ISA relationship sets, and aggregations to a semi-structured schema graph (S3-Graph) used to represent semi-structured data is given. Our study reveals similarities between the S3-Graph and the hierarchical model and nested relations in that all have limitations in modeling situations with nonhierarchical relationships given their tree-like structures.
2000
Semi-structured data has become prevalent with the growth of the Internet and other on-line information repositories. Many organizational databases are presented on the web as semi-structured data. Designing a “good” semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we define a semi-structured schema graph and identify the various anomalies that may occur
Proceedings of the Second International Conference on Web Information Systems Engineering
Semistructured data has become prevalent with the growth of the Internet. The development of new web applications that require efficient design and maintenance of large amounts of data makes it increasingly important to design "good" semistructured databases to prevent data redundancy and updating anomalies. However, it is not easy, even impossible, for current semistructured data models to capture the semantics traditionally needed for designing databases. In this paper, we show how an Object-Relationship-Attribute model for SemStructured data (ORA-SS) can facilitate the design of "good" semistructured databases. This is accomplished via the normalization of ORA-SS. An XML DTD or Schema generated from a normal form ORA-SS schema diagram has no undesirable redundancy, and thus no updating anomalies for the complying semistructured databases. The general design methodology and detailed steps for converting an ORA-SS schema diagram into a normal form ORA-SS schema diagram are presented. These steps can also be used as guidelines for designing semistructured databases using the ORA-SS model.
Lecture Notes in Computer Science, 2002
Semistructured data is becoming increasingly important for web applications with the development of XML and related technologies. Designing a "good" semistructured database is crucial to prevent data redundancy, inconsistency and undesirable updating anomalies. However, unlike relational databases, there is no normalization theory to facilitate the design of good semistructured databases. In this paper, we introduce the notion of a semistructured schema and identify the various anomalies that may occur in such a schema. A Normal Form for Semistructured Schemata, NF-SS, is proposed. A semistructured schema in NF-SS guarantees minimal redundancy and hence no undesirable updating anomalies for the associated semistructured databases. Furthermore, a semistructured schema in NF-SS gives a more reasonable representation of real world semantics. We develop an iterative algorithm based on a set of heuristic rules to restructure a semistructured schema into a normal form. These design methods also provide insights into the normalization task for semistructured databases.
Semi-structured Data are becoming extremely popular in versatile applications including interactive web application, protein structure analysis, 3D object representation, personal lifetime information management. In order to meet the challenges of today's complex applications, a generic model is in demand. This paper therefore focuses to examine the Semi-structured Data Model and implementation issues for Semi-structured Data. The paper assumes that: fluidity in data structure makes it difficult to store and manage the semi structured data using conventional data models like Relational Database model; the main advantage of fully structured data is the strong typing which enables high performance and efficiency; unstructured and semi structured data allow a higher degree of flexibility; Graph based models (e.g OEM) can be used to index semi-structured data; data modeling technique in OEM allows the data to be stored in graph based model; the data in graph based model is easier to search/ index; and finally, XML allows data to be arranged in hierarchical order which enables the data to be indexed and searched as well.
Semi-structured Databases (SSD) are becoming extremely popular in versatile applications including interactive web application, protein structure analysis, 3D object representation, personal lifetime information management. The list is endless. In order to meet the challenges of today's complex applications , a generic SSD model is in demand. Many works have been reported on this. In this paper, expectations from a generic SSD model are studied by a critical survey among existing models.
A Data Model is the foundation of any software application. These applications can be Retail, Accounting, Customer Relationship Management, Finance, Analysis, Decision Support, Transaction Processing or Analysis Processing to name just a few. These data models are purpose built following best practices of normalization as documented by Codd and Date. Comparison of two Data Models can become subjective since there are limited quantifiable metrics to do a comparison. Extensive research has been done exploring methods for the comparison of Graphs. Here we will present a mechanism for translating a Data Model into a Data Structure Graph that can have the best practices of Graph Theory applied to it, as well as use the metrics that are provided by Graph Theory for quantification and comparison of Data Models.
Foundations of Database Design, 2003
2001
XML is increasingly being adopted for information publishing on the World Wide Web. However, the underlying data is often stored in the relational databases. Some mechanism is needed to convert the relational data into XML data. In this work, we employ a semantically rich semistructured data model, the Object-Relationship-Attribute model for semistructured data, as a middleware to support the schema conversion from semantically enriched relational schema to XML Schema. This approach allows us to handle the translation of a set of related relations and to distinguish attributes of relationship types from attributes of object classes, multivalued attributes, and different types of relationships such as binary, n-ary, recursive and ISA. The resulting XML structures are able to reflect the inherent semantics and implicit structure in the underlying relational database. We also show that the appropriate use of references is able to avoid unnecessary redundancy and the proliferation of disconnected XML elements.
1999
ABSTRACT. We extend the model for semi-structured data proposed in [BUN 97], where both databases and schemas are represented as graphs, with the possibility of expressing different types of constraints on the nodes of the graphs. We discuss how the expressive power of the constraint language may influence the complexity of checking subsumption between schemas, and devise a polynomial algorithm for an interesting class of constraints.
2020
Aggregate oriented databases[26] support collections of documents with unstructured data. The unstructured markup means that anything can be stored anywhere. Notations for conceptual modeling of entity relationship diagrams have been extended with notation for aggregation, which satisfy the nesting of structures in aggregate oriented databases [11]. Conceptual modeling of ER-diagrams [5,27] has the benefit that a generic diagram of the whole database is sufficient for moving forward to a design activity, while conceptual modeling of aggregate oriented databases has established a notation for modeling [2]. However the model is specific and not generic. Solutions that prefer a different nesting needs a separate aggregate data model. Our approach to conceptual modeling does not violate the existing relational constraints and provides an extra conceptual layer where documentstructure can be modeled by color coding. We will show that modeling decisions can readily be made and materialize...
Lecture Notes in Computer Science, 2015
The paper aims at development of well-founded notions of database structure models that are specified in entity-relationship modelling languages. These notions reflect the functions a model fulfills in utilisation scenarios. 1 Utilisation Scenarios of Conceptual Models Conceptual models are used as an artifact in many utilisation scenarios. Design science research [4] and ER schema development methodologies (e.g. [3,8,11]) developed so far a good number of such scenarios Communication and negotiation scenario: The conceptual model is used for exchange of meanings through a common understanding of notations, signs and symbols within an application area. It can also be used in a back-andforth process in which interested parties with different interests find a way to reconcile or compromise to come up with an agreement. The schema provides negotiable and debatable propositions about the understanding of the part of the reality but does not have well-developed justificatory explanations. Conceptualisation scenario: The main application area for extended entityrelationship models is the conceptualisation of database applications. Conceptualisation is typically shuffled with discovery of phenomena of interest, analysis of main constructs and focus on relevant aspects within the application area. The specification incorporates concepts injected from the application domain. Description scenario: In a description scenario, the model provides a specification how the part of the reality that is of interest is perceived and in which way augmentations of current reality are targeted. The model says what the structure of an envisioned database is and what it will be. Prescription scenario: The conceptual model is used as a blueprint for or prescription of a database application, especially for prescribing the structures and constraints in such applications. The schema proposes what the structure of a database is on the one hand and how and where to construct the realisation on the other hand. ER schemata can be translated to relational, XML or other schemata based on transformation profiles [11] that incorporate properties of the target systems.
Lecture Notes in Computer Science, 2009
We propose a mapping from the Enhanced Entity Relationship conceptual model to the W3C XML Schema Language with the following properties: information and integrity constraints are preserved, no redundance is introduced, different hierarchical views of the conceptual information are available, the resulting XML structure is highly connected, and the design is reversible. We investigate two different ways to nest the XML structure: a maximum connectivity nesting, that minimizes the number of schema constraints used in the mapping of the conceptual schema reducing the validation overhead, and a maximum depth nesting, that keeps low the number of (expensive) join operations that are necessary to reconstruct the information at query time using the mapped schema. We propose a graph-theoretic linear-time algorithm to find a maximum connectivity nesting and show that finding a maximum depth nesting is NP-complete. We complement our investigation with an implementation of the devised translation and we embed the implemented module in a software framework for the conceptual and logical design of spatio-temporal databases 1 .
Data & knowledge engineering, 1987
This paper presents the system ADDS that has been developed to assist the database designer designing a database schema. A distinction is made between the stage of information structure analysis, in which the information structure of the system is defined according to its user information needs, and the stage of database schema design, in which the record types of the database and the relationships between them are defined. In the first stage a conceptual schema is obtained, represented as an information structure diagram (ISD), and in the later stage the ISD is used to derive the database schema in the form of a data structure diagram (DSD). ADDS automatically creates the database schema out of a conceptual schema which is expressed as an ISD of the binary-relationship data mode. The resulting schema consists of normalized record types, according to the relational model, along with hierarchical/set relationships between ‘owner’ and ‘member’ record types. ADDS applies algorithms to convert the conceptual schema into the relational database schema.
Citeseer
Semi-structured data is becoming increasingly important with the introduction of XML and related languages and technologies. The recent shift from DTDs (document type de nitions) to XML-Schema for XML data highlights the importance of a schema de nition for semi-structured data ...
PhD Thesis, The University of Hong Kong, 2009
The nature of software applications is evolving very quickly in the past decade since the World Wide Web has been popularized. Some web applications are required to process large datasets which do not have well-defined structures. This has been challenging conventional data engineering methods. A conventional data engineering method typically requires that a system architect should have prior knowledge on what and how data are processed in an application so as to design a good database schema that optimizes data computations and storage. However, for a web application processing large-scale semi-structured and unstructured data, schema design tasks cannot always be handled totally by human, and need to be automated by software tools. In this thesis, I study the problems of schema computations for semi-structured XML data and unstructured RDF data. This thesis consists of two parts. In the first part, I investigate into the XML data interoperability problem of web services. To address this problem, I develop a formal model for XML schemas called Schema Automaton, and derive the computational techniques for schema compatibility testing and subschema extraction. In the second part, I investigate di?erent types of databases for RDF data. For one particular database type called property tables, I propose a new datamining technique namely Attribute Clustering by Table Load to automate the schema design for the database based on the underlying data patterns.
2000
Abstract Recently, there have been several proposals of formalisms for modeling semistructured data, which is data that is neither raw, nor strictly typed as in conventional database systems. Semistructured data models are graph-based models, where graphs are used to represent both databases and schemas.
ACM Transactions on Database Systems, 1992
A common approach to database design is to describe the structures and constraints of the database application in terms of a semantic data model, and then represent the resulting schema using the data model of a commercial database management system. Often, in practice, Extended Entity-Relationship (EER) schemas are translated into equivalent relational schemas. This translation involves different aspects: representing the
2000
Abstract. Recently, there have been several proposals of formalisms for modeling semistructured data, which is data that is neither raw, nor strictly typed as in conventional database systems. Semistructured data models are graph-based models, where graphs are used to represent both databases and schemas.
… Information Processing and Management, Vol 29 …, 2004
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.