Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003, International Conference on Enterprise Information Systems
…
9 pages
1 file
The paper presents "Any Input XML Output" (AIXO), a general and flexible software architecture for wrappers. The architecture has been designed to present data sources as collections of XML documents. The use of XSLT as extraction language permits extensive reuse of standards, tools and knowledge. A prototype developed in Java has been effectively proven in several case studies. The tool has also been successfully integrated as a wrapper service into BioAgent, a mobile agent middleware specialized for use in the molecular biology domain.
Bioinformatics, 2001
Motivation: The eXtensible Markup Language (XML) is an emerging standard for structuring documents, notably
Future Directions and Advanced Technologies, 2009
There is a proliferation of research and industrial organizations that produce sources of huge amounts of biological data issuing from experimentation with biological systems. In order to make these heterogeneous data sources easy to use, several efforts at data integration are currently being undertaken based mainly on XML. Starting from a discussion of the main biological data types and system interactions that need to be represented, the authors discuss the main approaches proposed for their modelling through XML. Then, they show the current efforts in biological data integration and how an increasing amount of Semantic information is required in terms of vocabulary control and ontologies. Finally, future research directions in biological data integration are discussed.
IBM Systems Journal, 2000
Although the Extensible Markup Language (XML) has gained in popularity and has resulted in the creation of powerful software for authoring, transforming, and querying XML-based business data, much information remains in non-XML form. In this paper we introduce an approach to virtualize data resources and thus enable applications to access both XML and non-XML sources. We describe the architectural components that enable virtual XML-a toolbox that includes a cursor model, an XML-view mechanism such as the view created with the Data Format Description Language (DFDL), and XML processing languages. We illustrate the applicability of virtual XML through a number of use cases in various environments. We discuss the products that we expect from vendors and the open-source community and the way enterprises can plan to take advantage of virtual XML developments. Finally, we outline future research directions that include a vision of virtual XML that covers large-scale structures such as entire file systems, databases, or even the World Wide Web.
Journal of the …, 2005
A variety of biological data is transferred and exchanged in overwhelming volumes on the World Wide Web. How to rapidly capture, utilize and integrate the information on the Internet to discover valuable biological knowledge is one of the most critical issues in bioinformatics. Many information integration systems have been proposed for integrating biological data. These systems usually rely on an intermediate software layer called wrappers to access connected information sources. Wrapper construction for Web data sources is often specially hand coded to accommodate the differences between each Web site. However, programming a Web wrapper requires substantial programming skill and is time-consuming and hard to maintain. This paper provides a solution for rapidly building software agents that can serve as Web wrappers for biological information integration. We define an XMLbased language called WNDL, which provides a representation of a Web browsing session. A WNDL script describes how to locate the data, extract the data and combine the data. By executing different WNDL scripts, user can automate virtually all types of Web browsing sessions. We also describe IEPAD, a data extractor based on pattern discovery techniques. IEPAD allows our software agents to automatically discover the extraction rules to extract the contents of a structurally formatted Web page. With a programming-by-example authoring tool, a user can generate a complete Web wrapper agent by browsing the target Web sites. We built a variety of biological applications to demonstrate the feasibility of our approach. The software is available at http://chunnan.iis.sinica.edu.tw/software.html or by contacting the authors.
SPIE Proceedings, 2008
Increased complexity of scientific research poses new challenges to scientific data management. Meanwhile, scientific collaboration is becoming increasing important, which relies on integrating and sharing data from distributed institutions. We develop SciPort, a Web-based platform on supporting scientific data management and integration based on a central server based distributed architecture, where researchers can easily collect, publish, and share their complex scientific data across multi-institutions. SciPort provides an XML based general approach to model complex scientific data by representing them as XML documents. The documents capture not only hierarchical structured data, but also images and raw data through references. In addition, SciPort provides an XML based hierarchical organization of the overall data space to make it convenient for quick browsing. To provide generalization, schemas and hierarchies are customizable with XML-based definitions, thus it is possible to quickly adapt the system to different applications. While each institution can manage documents on a Local SciPort Server independently, selected documents can be published to a Central Server to form a global view of shared data across all sites. By storing documents in a native XML database, SciPort provides high schema extensibility and supports comprehensive queries through XQuery. By providing a unified and effective means for data modeling, data access and customization with XML, SciPort provides a flexible and powerful platform for sharing scientific data for scientific research communities, and has been successfully used in both biomedical research and clinical trials.
World Wide Web Conference Series, 2005
The increased importance of XML as a data representation format has led to several proposals for facilitating the development of applications that operate on XML data. These proposals range from runtime API-based interfaces to XML-based programming languages. The subject of this paper is XJ, a research language that proposes novel mechanisms for the integration of XML as a first-class construct
IEEE Internet Computing, 2002
This XML-based distributed system manages scientific metadata in various formats and supports sophisticated search and interactive data-access capabilities. S atellites and other Earth-observing systems produce huge amounts of data at ever-expanding rates. The EOS satellite Terra alone adds more than half a terabyte of data each day, 1 and other Earth-observing platforms and computer weather and climate models produce even more. To use this data in their research effectively, scientists need distributed user-centric information systems with effective search, analysis, and ordering capabilities. A data access and analysis system must let scientific data users find, evaluate, access, and use data online regardless of its location or format. A data-delivery mechanism as simple as FTP is useful for data exchange, but its limitations are obvious. The Distributed Oceanographic Data System (DODS, www.unidata.ucar. edu/packages/dods), which enables distributed access to online digital data, is a more sophisticated data-delivery system. When one integrates DODS with the Grid Analysis and Display System (Grads), 2 the 1089-7801/ 02/$17.00 ©2002 IEEE IEEE INTERNET COMPUTING Database Technology on the Web XML-Based Distributed Metadata Server The Extensible Markup Language is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data. Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML. Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines (see the sidebar, "Related Research in Data Integration"). Our XML-based software solution, on the other hand, supports a wide variety of metadata. Dimes, with its metadata model, XML query engine, and Web-based prototype interface, exemplifies this approach.
Proceedings of the twelfth international conference on World Wide Web - WWW '03, 2003
Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.
Data & Knowledge Engineering, 2005
Data integration of geographically dispersed, heterogeneous, complex biological databases is a key research area. One of the key features of a successful data integration system is to have a simple self-describing data exchange format. However, many of the biological databases provide data in flat files which are poor data exchange formats. Fortunately, XML can be viewed as a powerful data model and better data exchange format. In this paper, we present the Bio2X system that transforms flat file data into highly hierarchical XML data using rule-based machine learning technique. Bio2X has been fully implemented using Java. Our experiments to transform real world biological data demonstrate the effectiveness of the Bio2X approach.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2002
Proceedings of the AMIA Symposium, 2002
International Journal of Web Information Systems, 2011
Dictionary of XML Technologies and the Semantic Web, 2004
International Conference on Enterprise Information Systems, 2001
International Journal of Information Retrieval Research, 2011
Fundamenta Informaticae, 2016
Journal of International Technology and Information Management, 2004
Communications of the IIMA
Lecture Notes in Computer Science, 2005
Proceedings of the International Conference on …, 2000