Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005
…
10 pages
1 file
Semistructured data has become prevalent with the growth of the Internet. The development of new web applications that require efficient design and maintenance of large amounts of data makes it increasingly important to design "good" semistructured databases to prevent data redundancy and updating anomalies. However, it is not easy, even impossible, for current semistructured data models to capture the semantics traditionally needed for designing databases. In this paper, we show how an Object-Relationship-Attribute model for SemStructured data (ORA-SS) can facilitate the design of "good" semistructured databases. This is accomplished via the normalization of ORA-SS. An XML DTD or Schema generated from a normal form ORA-SS schema diagram has no undesirable redundancy, and thus no updating anomalies for the complying semistructured databases. The general design methodology and detailed steps for converting an ORA-SS schema diagram into a normal form ORA-SS schema diagram are presented. These steps can also be used as guidelines for designing semistructured databases using the ORA-SS model.
Lecture Notes in Computer Science, 2001
Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the web as semi-structured data, other information providers are developing repositories for new application. One such application is e-commerce, which is emerging as a major web-supported application assisting business transactions between multiple parties via the network and involving large amounts of data. Designing a \good" semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we propose a conceptual approach to design semi-structured databases. A conceptual layer which is based on the popular Entity-Relationship (ER) model is employed to remove anomalies and redundancies at the semantic level. An algorithm to map an ER diagram involving composite attributes weak entity types, recursive, n-ary and ISA relationship sets, and aggregations to a semi-structured schema graph (S3-Graph) used to represent semi-structured data is given. Our study reveals similarities between the S3-Graph and the hierarchical model and nested relations in that all have limitations in modeling situations with nonhierarchical relationships given their tree-like structures.
Lecture Notes in Computer Science, 2002
Semistructured data is becoming increasingly important for web applications with the development of XML and related technologies. Designing a "good" semistructured database is crucial to prevent data redundancy, inconsistency and undesirable updating anomalies. However, unlike relational databases, there is no normalization theory to facilitate the design of good semistructured databases. In this paper, we introduce the notion of a semistructured schema and identify the various anomalies that may occur in such a schema. A Normal Form for Semistructured Schemata, NF-SS, is proposed. A semistructured schema in NF-SS guarantees minimal redundancy and hence no undesirable updating anomalies for the associated semistructured databases. Furthermore, a semistructured schema in NF-SS gives a more reasonable representation of real world semantics. We develop an iterative algorithm based on a set of heuristic rules to restructure a semistructured schema into a normal form. These design methods also provide insights into the normalization task for semistructured databases.
Citeseer
Semi-structured data is becoming increasingly important with the introduction of XML and related languages and technologies. The recent shift from DTDs (document type de nitions) to XML-Schema for XML data highlights the importance of a schema de nition for semi-structured data ...
Semi-structured Data are becoming extremely popular in versatile applications including interactive web application, protein structure analysis, 3D object representation, personal lifetime information management. In order to meet the challenges of today's complex applications, a generic model is in demand. This paper therefore focuses to examine the Semi-structured Data Model and implementation issues for Semi-structured Data. The paper assumes that: fluidity in data structure makes it difficult to store and manage the semi structured data using conventional data models like Relational Database model; the main advantage of fully structured data is the strong typing which enables high performance and efficiency; unstructured and semi structured data allow a higher degree of flexibility; Graph based models (e.g OEM) can be used to index semi-structured data; data modeling technique in OEM allows the data to be stored in graph based model; the data in graph based model is easier to search/ index; and finally, XML allows data to be arranged in hierarchical order which enables the data to be indexed and searched as well.
Lecture Notes in Computer Science, 2006
There has been a rapid growth in the use of semistructured data in both web applications and database systems. Consequently, the design of a good semistructured data model is essential. In the relational database community, algorithms have been defined to transform a relational schema from one normal form to a more suitable normal form. These algorithms have been shown to preserve certain semantics during the transformation. The work presented in this paper is the first step towards representing such algorithms for semistructured data, namely formally defining the semantics necessary for achieving this goal. Formal semantics and automated reasoning tools enable us to reveal the inconsistencies in a semistructured data model and its instances. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. This paper presents a methodology of encoding the semantics of the ORA-SS notation into the Web Ontology Language (OWL) and automatically verifying the semistructured data design using the OWL reasoning tools. Our methodology provides automated consistency checking of an ORA-SS data model at both the schema and instance levels.
PhD Thesis, The University of Hong Kong, 2009
The nature of software applications is evolving very quickly in the past decade since the World Wide Web has been popularized. Some web applications are required to process large datasets which do not have well-defined structures. This has been challenging conventional data engineering methods. A conventional data engineering method typically requires that a system architect should have prior knowledge on what and how data are processed in an application so as to design a good database schema that optimizes data computations and storage. However, for a web application processing large-scale semi-structured and unstructured data, schema design tasks cannot always be handled totally by human, and need to be automated by software tools. In this thesis, I study the problems of schema computations for semi-structured XML data and unstructured RDF data. This thesis consists of two parts. In the first part, I investigate into the XML data interoperability problem of web services. To address this problem, I develop a formal model for XML schemas called Schema Automaton, and derive the computational techniques for schema compatibility testing and subschema extraction. In the second part, I investigate di?erent types of databases for RDF data. For one particular database type called property tables, I propose a new datamining technique namely Attribute Clustering by Table Load to automate the schema design for the database based on the underlying data patterns.
2001
XML is increasingly being adopted for information publishing on the World Wide Web. However, the underlying data is often stored in the relational databases. Some mechanism is needed to convert the relational data into XML data. In this work, we employ a semantically rich semistructured data model, the Object-Relationship-Attribute model for semistructured data, as a middleware to support the schema conversion from semantically enriched relational schema to XML Schema. This approach allows us to handle the translation of a set of related relations and to distinguish attributes of relationship types from attributes of object classes, multivalued attributes, and different types of relationships such as binary, n-ary, recursive and ISA. The resulting XML structures are able to reflect the inherent semantics and implicit structure in the underlying relational database. We also show that the appropriate use of references is able to avoid unnecessary redundancy and the proliferation of disconnected XML elements.
2000
Semi-structured data has become prevalent with the growth of the Internet and other on-line information repositories. Many organizational databases are presented on the web as semi-structured data. Designing a “good” semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we define a semi-structured schema graph and identify the various anomalies that may occur
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IAEME PUBLICATION, 2020
Proceedings of the International Workshop on Knowledge Representation and Databases (KRDB-98) at ACM-SIGMOD, 1998
International Journal of Emerging Technologies in Learning (iJET), 2009
Journal of Database Management, 1997
Information Systems, 2004
Lecture Notes in Computer Science, 2006
… Information Processing and Management, Vol 29 …, 2004
Proceedings of the 2001 ACM Symposium …, 2001
International Journal of Advanced Corporate Learning (iJAC), 2010
Data & Knowledge Engineering, 2005