Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, Springer eBooks
AI
This book addresses the challenges of designing semi-structured databases, emphasizing the development of algorithms and tools to mitigate update anomalies caused by poorly designed structures. A systematic approach is outlined that initially resists redundancy but subsequently allows it for improved query performance. The content is relevant for researchers, practitioners, and students interested in semi-structured data management.
Proceedings of the Second International Conference on Web Information Systems Engineering
Semistructured data has become prevalent with the growth of the Internet. The development of new web applications that require efficient design and maintenance of large amounts of data makes it increasingly important to design "good" semistructured databases to prevent data redundancy and updating anomalies. However, it is not easy, even impossible, for current semistructured data models to capture the semantics traditionally needed for designing databases. In this paper, we show how an Object-Relationship-Attribute model for SemStructured data (ORA-SS) can facilitate the design of "good" semistructured databases. This is accomplished via the normalization of ORA-SS. An XML DTD or Schema generated from a normal form ORA-SS schema diagram has no undesirable redundancy, and thus no updating anomalies for the complying semistructured databases. The general design methodology and detailed steps for converting an ORA-SS schema diagram into a normal form ORA-SS schema diagram are presented. These steps can also be used as guidelines for designing semistructured databases using the ORA-SS model.
Lecture Notes in Computer Science, 2001
Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the web as semi-structured data, other information providers are developing repositories for new application. One such application is e-commerce, which is emerging as a major web-supported application assisting business transactions between multiple parties via the network and involving large amounts of data. Designing a \good" semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we propose a conceptual approach to design semi-structured databases. A conceptual layer which is based on the popular Entity-Relationship (ER) model is employed to remove anomalies and redundancies at the semantic level. An algorithm to map an ER diagram involving composite attributes weak entity types, recursive, n-ary and ISA relationship sets, and aggregations to a semi-structured schema graph (S3-Graph) used to represent semi-structured data is given. Our study reveals similarities between the S3-Graph and the hierarchical model and nested relations in that all have limitations in modeling situations with nonhierarchical relationships given their tree-like structures.
Semi-structured Data are becoming extremely popular in versatile applications including interactive web application, protein structure analysis, 3D object representation, personal lifetime information management. In order to meet the challenges of today's complex applications, a generic model is in demand. This paper therefore focuses to examine the Semi-structured Data Model and implementation issues for Semi-structured Data. The paper assumes that: fluidity in data structure makes it difficult to store and manage the semi structured data using conventional data models like Relational Database model; the main advantage of fully structured data is the strong typing which enables high performance and efficiency; unstructured and semi structured data allow a higher degree of flexibility; Graph based models (e.g OEM) can be used to index semi-structured data; data modeling technique in OEM allows the data to be stored in graph based model; the data in graph based model is easier to search/ index; and finally, XML allows data to be arranged in hierarchical order which enables the data to be indexed and searched as well.
9th International Database Engineering & Application Symposium (IDEAS'05)
In this paper we describe an approach and system for managing enterprise semi-structured data that is high-throughput, nimble, and scalable. We present the NETMARK system, which provides for a "schemaless" way of managing semi-structured documents. We describe in particular detail the unique underlying data storage approach and efficient query processing mechanisms given this storage system. We present an extensive benchmark evaluation of the NETMARK system and also compare it with related XML management systems. At the heart of the approach is the philosophy of a focus on most common data management requirements in the enterprise, and not burdening users and application developers with unnecessary complexity and formal schemas.
IAEME PUBLICATION, 2020
Multidimensional Analysis or On-Line Analytical Processing (OLAP) has generally used for the analysis of structured data, in the context of data warehouses. However, these techniques are not appropriate to manage or analyze semi-structured data, as XML (eXtensible Mark-up Language) documents. In this paper, we propose a meta-model of heterogeneous semi-structure data (Structure and Content) and a multidimensional model based on galaxy model in order to analyze the content of this data type according to several perspectives.
World Wide Web, 2001
Page 1. World Wide Web, 4: 7999 (2001) © 2001 Kluwer Academic Publishers Modelling and Manipulating Multidimensional Data in Semistructured Databases RAYMOND K. WONG and FRANKY LAM School of Computer ...
… Information Processing and Management, Vol 29 …, 2004
2000
Abstract. Recently, there have been several proposals of formalisms for modeling semistructured data, which is data that is neither raw, nor strictly typed as in conventional database systems. Semistructured data models are graph-based models, where graphs are used to represent both databases and schemas.
The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficient handling of structural relationships has become a key factor in XML query processing. It is therefore a major challenge for the database community to design query processing techniques and storage methods that can manage semistructured data efficiently. The main contribution of this paper is querying semistructured data using bitmap to represent path-value relationship and compress the bitmap to save space. The presented bitmap indexing and querying scheme termed BIQS data that stores the element path, token of the word, attribute and document number in a dynamically created matrix structure. We use word, attribute and path dictionaries for the construction of a Bitmap structure. This paper describes an algorithm to query semistructured data in a more time efficient way than is provided by other relational and semistructured query processing techniques. The presented BIQS structure provides storage and query performance improvement due to the compression of semistructured data.
2005
Abstract. The growing importance of XML calls for easier access to data management technologies, in order to provide domain experts who are inexperienced in database technologies with the possibility to directly query and transform domain specific data. Intuitiveness and simplicity are gained with the use of a graphical representation. The former is obtained by depicting the hierarchical XML data model as tree structures; the latter consists in considering only elements, attributes, and un-typed textual data.
Citeseer
Semi-structured data is becoming increasingly important with the introduction of XML and related languages and technologies. The recent shift from DTDs (document type de nitions) to XML-Schema for XML data highlights the importance of a schema de nition for semi-structured data ...
Semi-structured Databases (SSD) are becoming extremely popular in versatile applications including interactive web application, protein structure analysis, 3D object representation, personal lifetime information management. The list is endless. In order to meet the challenges of today's complex applications , a generic SSD model is in demand. Many works have been reported on this. In this paper, expectations from a generic SSD model are studied by a critical survey among existing models.
2002
XML has been quickly emerging as a dominant standard for data representation and exchange on the World Wide Web for its many good features such as well-formed structure or semantic support. Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled in some form of a labeled, directed graph. Processing this as a sophisticated query on semistructured data is not very easy because of the complexity of the structure of the graph and the lack of corresponding schemata associated with it. To deal with such problems the paper proposes an approach to process semistructured data with XML. Although there are many similarities between semistructured data and XML there exist some differences. A key difference is that current XML DOM only supports tree structures and does not directly support graph structures. To deal with such differences two approaches in this paper are proposed to treat an XML document as a semantic graph and literal tree which are the foundation to transform semistructured data into XML documents for processing. For this purpose several algorithms are designed to transform semistructured data into XML documents and XML-Schema document based on the schema tree extracted from original semistructured data. To ensure that semistructured data can be reconstructed from XML documents this transformation must be lossless. Finally the paper also presents an algorithm for reconstructing semistructured data.
Lecture Notes in Computer Science, 2002
Semistructured data is becoming increasingly important for web applications with the development of XML and related technologies. Designing a "good" semistructured database is crucial to prevent data redundancy, inconsistency and undesirable updating anomalies. However, unlike relational databases, there is no normalization theory to facilitate the design of good semistructured databases. In this paper, we introduce the notion of a semistructured schema and identify the various anomalies that may occur in such a schema. A Normal Form for Semistructured Schemata, NF-SS, is proposed. A semistructured schema in NF-SS guarantees minimal redundancy and hence no undesirable updating anomalies for the associated semistructured databases. Furthermore, a semistructured schema in NF-SS gives a more reasonable representation of real world semantics. We develop an iterative algorithm based on a set of heuristic rules to restructure a semistructured schema into a normal form. These design methods also provide insights into the normalization task for semistructured databases.
2000
Semi-structured data has become prevalent with the growth of the Internet and other on-line information repositories. Many organizational databases are presented on the web as semi-structured data. Designing a “good” semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we define a semi-structured schema graph and identify the various anomalies that may occur
Very Large Data Bases, 1984
Information Systems, 2004
Multidimensional semistructured data (MSSD) are semistructured data that present different facets under different contexts. Context represents alternative worlds, and is expressed by assigning values to a set of user-defined variables called dimensions. The notion of context has been incorporated in the Object Exchange Model (OEM), and the extended model is called Multidimensional OEM (MOEM), a graph model for MSSD. In this paper, we explain in detail how MOEM can represent the history of an OEM database. We discuss how MOEM properties are applied in the case of
Proceedings of the International Workshop on Knowledge Representation and Databases (KRDB-98) at ACM-SIGMOD, 1998
There is much activity in the database research community on managing semistructured data but little experience to date in applying this research to substantial problems. This paper describes an effort to assess the applicability of this technology to organizations with large quantities of semistructured (and structured) information. We consider the issue of appropriate data models for semistructured data and then describe an evaluation framework currently under construction.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.