Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998, Fodo
…
30 pages
1 file
Sets and bags are closely related structures. A bag is di erent from a set in that it is sensitive to the number of times an element occurs while a set is not. In this paper, we introduce the concept of web bag in a web warehouse as a part of our Web Information Coupling System(WICS). Informally, a web bag is a web table which allows multiple occurrences of identical web tuples. Web bag helps to discover useful knowledge from a web table such as visible documents (or web sites), luminous documents and luminous paths. We formally discuss the semantics and properties of web bags, and illustrate with examples applications of web bag in knowledge discovery in a web warehouse.
The Computer Journal, 2003
We believe that, to manage Web data effectively, there is a need to build a data warehouse of Web data, i.e. a Web warehouse. In this paper, we focus on how to represent and store relevant hyperlinked Web documents effectively in a Web warehouse called WHOWEDA (WareHouse Of WEb DAta) for further querying and manipulation. We present a simple and general model for representing metadata, structure and content of Web documents and hyperlinks in WHOWEDA. We discuss node and link objects which are used to represent Web documents and hyperlinks respectively in WHOWEDA. These objects are first class objects in our data model called WHOM (WareHouse Object Model) which is designed to represent and manipulate Web data in the warehouse. An important feature of our model is that it represents metadata, content and structure as trees called node and link metadata trees, and node and link data trees.
Information and Software Technology, 2002
To populate a data warehouse specifically designed for Web data, i.e. web warehouse, it is imperative to harness relevant documents from the Web. In this paper, we describe a query mechanism called coupling query to glean relevant Web data in the context of our web warehousing system called Warehouse Of Web Data (WHOWEDA). Coupling query may be used for querying both HTML and XML documents. Some of the important features of our query mechanism are ability to query metadata, content, internal and external (hyperlink) structure of Web documents based on partial knowledge, ability to express constraints on tag attributes and tagless segment of data, ability to express conjunctive as well as disjunctive query conditions compactly, ability to control execution of a web query and preservation of the topological structure of hyperlinked documents in the query results. We also discuss how to formulate query graphically and in textual form using coupling graph and coupling text, respectively.
Information Fusion, 2008
In this study, we introduce a web information fusion tool-web warehouse, which is suitable for web mining and knowledge discovery. To formulate a web warehouse, a four-layer web warehouse architecture for decision support is firstly proposed. According to the layered web warehouse framework architecture, an extraction-fusion-mapping-loading (EFML) process model for web warehouse construction is then constructed. In the web warehouse process model, a series of web services including wrapper service, mediation service, ontology service and mapping service are used. Particularly, two kinds of mediators are introduced to fuse the heterogeneous web information. Finally, a simple case study is presented to illustrate the construction process of web warehouse.
1999
Abstract Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types.
Proceedings of the 13th Portuguese Conference on Artificial Intelligence (EPIA 2007), 2007
Abstract. The analysis, design and maintenance of Web sites involves two significant challenges: managing the services and content available, and secondly, making the site dynamically adequate to user's needs. The Site-O-Matic project (SOM) aims to develop a comprehensive framework for automating several of the management activities of a Web site. Such framework must comprehend a suitable database infrastructure, where all the information about the activity of the site is stored. In this paper we propose a data ...
Data & Knowledge Engineering, 2003
We describe how to formulate a coupling query to glean relevant Web data in the context of our web warehousing system called Whoweda (W arehouse Of Web Data). Coupling query may be used for querying both HTML and XML documents. One of the important feature of our query mechanism is the ability to express conjunctive as well as disjunctive query conditions compactly. We describe how to formulate a coupling query in text form as well as pictorially using the coupling text and the coupling graph respectively. We explore the limitations of coupling graph with respect to the coupling text. We found out that AND, OR and AND/ORcoupling graphs are less expressive than their textual counterparts. To address this shortcoming we introduce the notion of hybrid graph which is a special type of pconnected coupling graph. Finally, we discuss the implementation of a GUI-based system called VISCOUS (VIS ual COupling QU ery S ystem) for formulating such queries.
1999
Abstract With the enormous amount of data stored in the World Wide Web, it is increasingly important to design and develop powerful web warehousing tools. The key objective of our web warehousing project, called WHOWEDA (Warehouse of Web Data), is to design and implement a web warehouse that materializes and manages useful information from the web.
Business Intelligence Applications and the Web: Models, Systems and Technologies, 2012
Research in data warehousing and OLAP has produced important technologies for the design, management, and use of Information Systems for decision support. With the development of Internet, the availability of various types of data has increased. Thus, users require applications to help them obtaining knowledge from the Web. One possible solution to facilitate this task is to extract information from the Web, transform and load it to a Web Warehouse, which provides uniform access methods for automatic processing of the data. ...
1999
In this paper, we discuss various algebraic operations on web bags in the context of our web warehousing project called Whoweda (WareHouse Of WEb DAta). Informally, a web bag is a web table which allows multiple occurrences of identical web tuples. We examine how some of the web operations such as web union, web intersection, web join and local web coupling behave in the context of web bags. Our study reveals that the presence of identical web tuples accentuate the computational efficiency of some of the web algebraic operators.
Computing Research Repository, 2007
In a data warehousing process, the data preparation phase is crucial. Mastering this phase allows substantial gains in terms of time and performance when performing a multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context, but the data broadcasted on this medium are very heterogeneous. We propose in this paper a UML conceptual model for a complex object representing a superclass of any useful data source (databases, plain texts, HTML and XML documents, images, sounds, video clips...). The translation into a logical model is achieved with XML, which helps integrating all these diverse, heterogeneous data into a unified format, and whose schema definition provides first-rate metadata in our data warehousing context. Moreover, we benefit from XML's flexibility, extensibility and from the richness of the semistructured data model, but we are still able to later map XML documents into a database if more structuring is needed.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Knowledge and Data Engineering, 2008
2000
16th International Workshop on Database and Expert Systems Applications (DEXA'05), 2005
International Conference on Management of Data, 2009