Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, 13th IEEE International Conference on Engineering of Complex Computer Systems (iceccs 2008)
The dramatic expansion of semistructured data has led to the development of database systems for manipulating the data. Despite its huge potential, there is still a lack of formality and verification support in the design of good semistructured databases. Like traditional database systems, developed semistructured database systems should contain minimal redundancies and update anomalies, in order to store and manage the data effectively. Several normalization algorithms have been proposed to satisfy these needs, by transforming the schema of the semistructured data into a better form. It is essential to ensure that the normalized schema remains semantically equivalent to its original form. In this paper, we present tool support for reasoning about the correctness of semistructured data normalization. The proposed approach uses the ORA-SS data modeling notation and defines its correctness criteria and rules in the PVS formal language. It further utilizes the PVS theorem prover to perform automated checking on the normalized schema, checking that functional dependencies are preserved, no data is lost and no spurious data is created. In summary, our approach not only investigates the characteristics of semistructured data normalization, but also provides a scalable and automated first step towards reasoning about the correctness of normalization algorithms on semistructured data.
19th Australian Conference on Software Engineering (aswec 2008), 2008
The rapid increase in semistructured data usage has lead to the development of various database systems for semistructured data. Web services and applications that utilize large amounts of semistructured data require data to remain consistent and be stored efficient. Several normalization algorithms for semistructured database systems have been developed to satisfy these needs. However, these algorithms lack the verification that would ensure that data and constraints among the data are not lost or corrupted during normalization. In this paper, we propose a set of correctness criteria for normalization of semistructured data, which require that functional dependencies are preserved, data is not lost, and spurious data is not created during normalization. We use the Z specification language to provide a precise and declarative definition of our criteria.
Lecture Notes in Computer Science, 2006
Semistructured data is now widely used in both web applications and database systems. Much of the research into this area defines algorithms that transform the data and schema, such as data integration, change management, view definition, and data normalization. While some researchers have defined a formalism for the work they have undertaken, there is no widely accepted formalism that can be used for the comparison of algorithms within these areas. The requirements of a formalism that would be helpful in these situations are that it must capture all the necessary semantics required to model the algorithms, it should not be too complex and it should be easy to use. This paper describes a first step in defining such a formalism. We have modelled the semantics expressed in the ORA-SS (Object Relationship Attribute data model for SemiStructured data) data modelling notation in two formal languages that have automatic verification tools. We compare the two models and present the findings.
Semistructured data is now widely used in both web applications and database systems. There are many research challenges in this area, such as data integra- tion, change management, view definition, and data normalization. Traditionally in these areas a formal- ism is defined for the database model, and properties of the algorithms can be reasoned about, such as the dependency preserving property of the normalization algorithm in the relational data model. Because re- search into semistructured data is still in its infancy, many algorithms have been defined in this area and a number of formalisms have been proposed but there is no widely accepted formalism that is generally ac- cepted to reason about the properties of the algo- rithms. Such a formalism must capture all the nec- essary semantics required to model the algorithms, should not be too complex, and should be easy to use. Another area that has been developing steadily is automatic verification. This involves formally speci-...
J. Univers. Comput. Sci., 2009
The rapid growth of the World Wide Web has resulted in a dramatic in- crease in semistructured data usage, creating a growing need for effective and efficient utilization of semistructured data. In order to verify the correctness of semistructured data design, precise descriptions of the schemas and transformations on the schemas must be established. One effective way to achieve this goal is through formal model- ing and automated verification. This paper presents the first step towards this goal. In our approach, we have formally specified the semantics of the ORA-SS (Object- Relationship-Attribute data model for Semistructured data) data modeling language in PVS (Prototype Verification System) and provided automated verification support for both ORA-SS schemas and XML (Extensible Markup Language) data instances using the PVS theorem prover. This approach provides a solid basis for verifying algo- rithms that transform schemas for semistructured data.
Lecture Notes in Computer Science, 2002
Semistructured data is becoming increasingly important for web applications with the development of XML and related technologies. Designing a "good" semistructured database is crucial to prevent data redundancy, inconsistency and undesirable updating anomalies. However, unlike relational databases, there is no normalization theory to facilitate the design of good semistructured databases. In this paper, we introduce the notion of a semistructured schema and identify the various anomalies that may occur in such a schema. A Normal Form for Semistructured Schemata, NF-SS, is proposed. A semistructured schema in NF-SS guarantees minimal redundancy and hence no undesirable updating anomalies for the associated semistructured databases. Furthermore, a semistructured schema in NF-SS gives a more reasonable representation of real world semantics. We develop an iterative algorithm based on a set of heuristic rules to restructure a semistructured schema into a normal form. These design methods also provide insights into the normalization task for semistructured databases.
Formal Methods in System Design, 2010
The wide adoption of semistructured data has created a growing need for effective ways to ensure the correctness of its organization. One effective way to achieve this goal is through formal specification and automated verification. This paper presents a theorem proving approach towards verifying that a particular design or organization of semistructured data is correct. We formally specify the semantics of the Object Relationship Attribute data model for Semistructured Data (ORA-SS) modeling notation and its correctness criteria for semistructured data normalization using the Prototype Verification System (PVS). The result is that effective verification on semistructured data models and their normalization can be carried out using the PVS theorem prover.
Proceedings of the Second International Conference on Web Information Systems Engineering
Semistructured data has become prevalent with the growth of the Internet. The development of new web applications that require efficient design and maintenance of large amounts of data makes it increasingly important to design "good" semistructured databases to prevent data redundancy and updating anomalies. However, it is not easy, even impossible, for current semistructured data models to capture the semantics traditionally needed for designing databases. In this paper, we show how an Object-Relationship-Attribute model for SemStructured data (ORA-SS) can facilitate the design of "good" semistructured databases. This is accomplished via the normalization of ORA-SS. An XML DTD or Schema generated from a normal form ORA-SS schema diagram has no undesirable redundancy, and thus no updating anomalies for the complying semistructured databases. The general design methodology and detailed steps for converting an ORA-SS schema diagram into a normal form ORA-SS schema diagram are presented. These steps can also be used as guidelines for designing semistructured databases using the ORA-SS model.
Reproduction for academic, not-for profit purposes permitted provided this text is included.
Lecture Notes in Computer Science, 2006
In this paper, basic relational database (DB) normalization algorithms are implemented efficiently as Mathematica modules. It was observed that, Mathematica provided a straightforward platform as opposed to previous ones, mainly Prolog based tools which required complex data structures such as linked list representations with pointers. A Java user interface called JMath-Norm was designed to execute the Mathematica modules in a systematic way. For this purpose, Mathematica's Java link facility (JLink) is utilized to drive the Mathematica kernel. JMath-Norm provides an effective interactive tool in an educational setting for teaching DB normalization theory.
www-vs.informatik.uni-ulm.de
With the present of XML and its use as a database, schema design and normal form theory has attracted novel research interest. In this paper we address the problem of schema design and normalization in XML databases model. We show that, like relational databases, XML documents may contain redundant information, and this redundancy may cause update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Based on our research works, in which we defined the functional dependencies and normal forms for XML Schema, we present the decomposition algorithm for converting any XML Schema into normalized one, that satisfies X-BCNF.
Electronic Notes in Theoretical Computer Science, 2006
The rapid growth of the World Wide Web has resulted in more data being accessed over the Internet. In turn there is an increase in the use of semistructured data, which plays a crucial role in many web applications particularly with the introduction of XML and its related technologies. This increase in use makes the design of good semistructured data structures essential. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. In this paper, we demonstrate an approach to formally validate the ORA-SS data models in order to enhance the correctness of semistructured data design. A mathematical semantics for the ORA-SS notation is defined using the Z formal language, and further validation processes are carried out to check the correctness of the semistructured data models at both the schema and instance levels.
PhD Thesis, The University of Hong Kong, 2009
The nature of software applications is evolving very quickly in the past decade since the World Wide Web has been popularized. Some web applications are required to process large datasets which do not have well-defined structures. This has been challenging conventional data engineering methods. A conventional data engineering method typically requires that a system architect should have prior knowledge on what and how data are processed in an application so as to design a good database schema that optimizes data computations and storage. However, for a web application processing large-scale semi-structured and unstructured data, schema design tasks cannot always be handled totally by human, and need to be automated by software tools. In this thesis, I study the problems of schema computations for semi-structured XML data and unstructured RDF data. This thesis consists of two parts. In the first part, I investigate into the XML data interoperability problem of web services. To address this problem, I develop a formal model for XML schemas called Schema Automaton, and derive the computational techniques for schema compatibility testing and subschema extraction. In the second part, I investigate di?erent types of databases for RDF data. For one particular database type called property tables, I propose a new datamining technique namely Attribute Clustering by Table Load to automate the schema design for the database based on the underlying data patterns.
Lecture Notes in Computer Science, 2006
There has been a rapid growth in the use of semistructured data in both web applications and database systems. Consequently, the design of a good semistructured data model is essential. In the relational database community, algorithms have been defined to transform a relational schema from one normal form to a more suitable normal form. These algorithms have been shown to preserve certain semantics during the transformation. The work presented in this paper is the first step towards representing such algorithms for semistructured data, namely formally defining the semantics necessary for achieving this goal. Formal semantics and automated reasoning tools enable us to reveal the inconsistencies in a semistructured data model and its instances. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. This paper presents a methodology of encoding the semantics of the ORA-SS notation into the Web Ontology Language (OWL) and automatically verifying the semistructured data design using the OWL reasoning tools. Our methodology provides automated consistency checking of an ORA-SS data model at both the schema and instance levels.
Lecture Notes in Computer Science, 2001
Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the web as semi-structured data, other information providers are developing repositories for new application. One such application is e-commerce, which is emerging as a major web-supported application assisting business transactions between multiple parties via the network and involving large amounts of data. Designing a \good" semi-structured database is increasingly crucial to prevent data redundancy, inconsistency and updating anomalies. In this paper, we propose a conceptual approach to design semi-structured databases. A conceptual layer which is based on the popular Entity-Relationship (ER) model is employed to remove anomalies and redundancies at the semantic level. An algorithm to map an ER diagram involving composite attributes weak entity types, recursive, n-ary and ISA relationship sets, and aggregations to a semi-structured schema graph (S3-Graph) used to represent semi-structured data is given. Our study reveals similarities between the S3-Graph and the hierarchical model and nested relations in that all have limitations in modeling situations with nonhierarchical relationships given their tree-like structures.
International Journal of Database Management Systems, 2011
In this paper a tool called RDBNorma is proposed, that uses a novel approach to represent a relational database schema and its functional dependencies in computer memory using only one linked list and used for semi-automating the process of relational database schema normalization up to third normal form. This paper addresses all the issues of representing a relational schema along with its functional dependencies using one linked list along with the algorithms to convert a relation into second and third normal form by using above representation. We have compared performance of RDBNorma with existing tool called Micro using standard relational schemas collected from various resources. It is observed that proposed tool is at least 2.89 times faster than the Micro and requires around half of the space than Micro to represent a relation. Comparison is done by entering all the attributes and functional dependencies holds on a relation in the same order and implementing both the tools in same language and on same machine.
2013
It has been estimated that more than 80 percent of all the computer programming is database related. Studies have shown that the vast majority of the content in the WWW resides in the deep web sources which store their content in backend databases which have been growing by leaps and bounds. Due to its great importance for database applications database schema design has attracted substantial research. Database normalization is a theoretical approach for structuring a database schema and it is very well developed but unfortunately, the theory is not yet understood well by practitioners. It has been difficult to motivate students to learn database normalization because students think the subject to be dry and purely theoretical. In this paper, a tool called Web Based Relational Database Design and Normalization Tool is proposed, which handles normalization for Relational Databases. The tool is suitable for relational data modeling in systems analysis and design and data management pr...
Citeseer
Semi-structured data is becoming increasingly important with the introduction of XML and related languages and technologies. The recent shift from DTDs (document type de nitions) to XML-Schema for XML data highlights the importance of a schema de nition for semi-structured data ...
2000
Abstract. Recently, there have been several proposals of formalisms for modeling semistructured data, which is data that is neither raw, nor strictly typed as in conventional database systems. Semistructured data models are graph-based models, where graphs are used to represent both databases and schemas.
… Information Processing and Management, Vol 29 …, 2004
Normalization is a process of analyzing the given relation schemas based on their Functional dependencies and primary keys to achieve the desirable properties of minimizing redundancy. It aims at creating a set of relational tables with minimum data redundancy that preserve consistency and facilitate correct insertion, deletion, and modification. A normalized database does not show various insertions, deletion and modification anomalies due to future updates. This paper presents a comparison study of manual and automatic normalization technique using sequential as well as parallel algorithm. It is very much time consuming to employ an automated technique to do this data analysis, as opposed to doing it manually. At the same time, the process is tested to be reliable and correct. It produces the dependency matrix and the directed graph matrix, first. It then proceeds with generating the 2NF, 3NF, and BCNF normal forms. All tables are also generated as the procedure proceeds.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.