Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Lecture Notes in Computer Science
…
14 pages
1 file
Semistructured data is now widely used in both web applications and database systems. Much of the research into this area defines algorithms that transform the data and schema, such as data integration, change management, view definition, and data normalization. While some researchers have defined a formalism for the work they have undertaken, there is no widely accepted formalism that can be used for the comparison of algorithms within these areas. The requirements of a formalism that would be helpful in these situations are that it must capture all the necessary semantics required to model the algorithms, it should not be too complex and it should be easy to use. This paper describes a first step in defining such a formalism. We have modelled the semantics expressed in the ORA-SS (Object Relationship Attribute data model for SemiStructured data) data modelling notation in two formal languages that have automatic verification tools. We compare the two models and present the findings.
Semistructured data is now widely used in both web applications and database systems. There are many research challenges in this area, such as data integra- tion, change management, view definition, and data normalization. Traditionally in these areas a formal- ism is defined for the database model, and properties of the algorithms can be reasoned about, such as the dependency preserving property of the normalization algorithm in the relational data model. Because re- search into semistructured data is still in its infancy, many algorithms have been defined in this area and a number of formalisms have been proposed but there is no widely accepted formalism that is generally ac- cepted to reason about the properties of the algo- rithms. Such a formalism must capture all the nec- essary semantics required to model the algorithms, should not be too complex, and should be easy to use. Another area that has been developing steadily is automatic verification. This involves formally speci-...
Lecture Notes in Computer Science, 2006
There has been a rapid growth in the use of semistructured data in both web applications and database systems. Consequently, the design of a good semistructured data model is essential. In the relational database community, algorithms have been defined to transform a relational schema from one normal form to a more suitable normal form. These algorithms have been shown to preserve certain semantics during the transformation. The work presented in this paper is the first step towards representing such algorithms for semistructured data, namely formally defining the semantics necessary for achieving this goal. Formal semantics and automated reasoning tools enable us to reveal the inconsistencies in a semistructured data model and its instances. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. This paper presents a methodology of encoding the semantics of the ORA-SS notation into the Web Ontology Language (OWL) and automatically verifying the semistructured data design using the OWL reasoning tools. Our methodology provides automated consistency checking of an ORA-SS data model at both the schema and instance levels.
Advances in Web-Age Information Management, 2006
Semistructured data has become prevalent in both web applications and database systems. This rapid growth in use makes the design of good semistructured data essential. Formal semantics and automated reasoning tools enable us to reveal the inconsistencies in a semistructured data model and its instances. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. This paper presents a methodology of encoding the semantics of ORA-SS in the Web Ontology Language (OWL) and automatically validating the semistructured data design using the OWL reasoning tool -RACER. Our methodology provides automated consistency checking of an ORA-SS data model at both the schema and instance levels.
J. Univers. Comput. Sci., 2009
The rapid growth of the World Wide Web has resulted in a dramatic in- crease in semistructured data usage, creating a growing need for effective and efficient utilization of semistructured data. In order to verify the correctness of semistructured data design, precise descriptions of the schemas and transformations on the schemas must be established. One effective way to achieve this goal is through formal model- ing and automated verification. This paper presents the first step towards this goal. In our approach, we have formally specified the semantics of the ORA-SS (Object- Relationship-Attribute data model for Semistructured data) data modeling language in PVS (Prototype Verification System) and provided automated verification support for both ORA-SS schemas and XML (Extensible Markup Language) data instances using the PVS theorem prover. This approach provides a solid basis for verifying algo- rithms that transform schemas for semistructured data.
Formal Methods in System Design, 2010
The wide adoption of semistructured data has created a growing need for effective ways to ensure the correctness of its organization. One effective way to achieve this goal is through formal specification and automated verification. This paper presents a theorem proving approach towards verifying that a particular design or organization of semistructured data is correct. We formally specify the semantics of the Object Relationship Attribute data model for Semistructured Data (ORA-SS) modeling notation and its correctness criteria for semistructured data normalization using the Prototype Verification System (PVS). The result is that effective verification on semistructured data models and their normalization can be carried out using the PVS theorem prover.
Electronic Notes in Theoretical Computer Science, 2006
The rapid growth of the World Wide Web has resulted in more data being accessed over the Internet. In turn there is an increase in the use of semistructured data, which plays a crucial role in many web applications particularly with the introduction of XML and its related technologies. This increase in use makes the design of good semistructured data structures essential. The Object Relationship Attribute model for Semistructured data (ORA-SS) is a graphical notation for designing and representing semistructured data. In this paper, we demonstrate an approach to formally validate the ORA-SS data models in order to enhance the correctness of semistructured data design. A mathematical semantics for the ORA-SS notation is defined using the Z formal language, and further validation processes are carried out to check the correctness of the semistructured data models at both the schema and instance levels.
PhD Thesis, The University of Hong Kong, 2009
The nature of software applications is evolving very quickly in the past decade since the World Wide Web has been popularized. Some web applications are required to process large datasets which do not have well-defined structures. This has been challenging conventional data engineering methods. A conventional data engineering method typically requires that a system architect should have prior knowledge on what and how data are processed in an application so as to design a good database schema that optimizes data computations and storage. However, for a web application processing large-scale semi-structured and unstructured data, schema design tasks cannot always be handled totally by human, and need to be automated by software tools. In this thesis, I study the problems of schema computations for semi-structured XML data and unstructured RDF data. This thesis consists of two parts. In the first part, I investigate into the XML data interoperability problem of web services. To address this problem, I develop a formal model for XML schemas called Schema Automaton, and derive the computational techniques for schema compatibility testing and subschema extraction. In the second part, I investigate di?erent types of databases for RDF data. For one particular database type called property tables, I propose a new datamining technique namely Attribute Clustering by Table Load to automate the schema design for the database based on the underlying data patterns.
Proceedings of the Second International Conference on Web Information Systems Engineering
Semistructured data has become prevalent with the growth of the Internet. The development of new web applications that require efficient design and maintenance of large amounts of data makes it increasingly important to design "good" semistructured databases to prevent data redundancy and updating anomalies. However, it is not easy, even impossible, for current semistructured data models to capture the semantics traditionally needed for designing databases. In this paper, we show how an Object-Relationship-Attribute model for SemStructured data (ORA-SS) can facilitate the design of "good" semistructured databases. This is accomplished via the normalization of ORA-SS. An XML DTD or Schema generated from a normal form ORA-SS schema diagram has no undesirable redundancy, and thus no updating anomalies for the complying semistructured databases. The general design methodology and detailed steps for converting an ORA-SS schema diagram into a normal form ORA-SS schema diagram are presented. These steps can also be used as guidelines for designing semistructured databases using the ORA-SS model.
Reproduction for academic, not-for profit purposes permitted provided this text is included.
13th IEEE International Conference on Engineering of Complex Computer Systems (iceccs 2008), 2008
The dramatic expansion of semistructured data has led to the development of database systems for manipulating the data. Despite its huge potential, there is still a lack of formality and verification support in the design of good semistructured databases. Like traditional database systems, developed semistructured database systems should contain minimal redundancies and update anomalies, in order to store and manage the data effectively. Several normalization algorithms have been proposed to satisfy these needs, by transforming the schema of the semistructured data into a better form. It is essential to ensure that the normalized schema remains semantically equivalent to its original form. In this paper, we present tool support for reasoning about the correctness of semistructured data normalization. The proposed approach uses the ORA-SS data modeling notation and defines its correctness criteria and rules in the PVS formal language. It further utilizes the PVS theorem prover to perform automated checking on the normalized schema, checking that functional dependencies are preserved, no data is lost and no spurious data is created. In summary, our approach not only investigates the characteristics of semistructured data normalization, but also provides a scalable and automated first step towards reasoning about the correctness of normalization algorithms on semistructured data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2002
Lecture Notes in Computer Science, 2001
Information Systems, 1998
… Information Processing and Management, Vol 29 …, 2004
Data & Knowledge Engineering, 2009
19th Australian Conference on Software Engineering (aswec 2008), 2008
Proc. 5th Int. Conf. on Deductive and …, 1997
Journal of Logic and …, 1999
Information & Management, 1990
Lecture Notes in Computer Science, 2002