Papers by David Konopnicki
In this work we present the details of a large scale user profiling framework that we developed h... more In this work we present the details of a large scale user profiling framework that we developed here in IBM on top of Apache Hadoop. We address the problem of extracting and maintaining a very large number of user profiles from large scale data. We first describe an efficient user profiling framework with high user profiling quality guarantees. We then describe a scalable implementation of the proposed framework in Apache Hadoop and discuss its challenges.

Information gathering in the World-Wide Web: the W3QL query language and the W3QS system
ACM Transactions on Database Systems, 1998
The World Wide Web (WWW) is a fast growing global information resource. It contains an enormous a... more The World Wide Web (WWW) is a fast growing global information resource. It contains an enormous amount of information and provides access to a variety of services. Since there is no central control and very few standards of information organization or service offering, searching for information and services is a widely recognized problem. To some degree this problem is solved by “search services,” also known as “indexers,” such as Lycos, AltaVista, Yahoo, and others. These sites employ search engines known as “robots” or “knowbots” that scan the network periodically and form text-based indices. These services are limited in certain important aspects. First, the structural information, namely, the organization of the document into parts pointing to each other, is usually lost. Second, one is limited by the kind of textual analysis provided by the “search service.” Third, search services are incapable of navigating “through” forms. Finally, one cannot prescribe a complex database-like search. We view the WWW as a huge database. We have designed a high-level SQL-like language called W3QL to support effective and flexible query processing, which addresses the structure and content of WWW nodes and their varied sorts of data. We have implemented a system called W3QS to execute W3QL queries. In W3QS, query results are declaratively specified and continuously maintained as views when desired. The current architecture of W3QS provides a server that enables users to pose queries as well as integrate their own data analysis tools. The system and its query language set a framework for the development of database-like tools over the WWW. A significant contribution of this article is in formalizing the WWW and query processing over it.
Toward Automated Electronic Commerce
... Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @MISC{Konopn... more ... Add a tag: No tags have been applied to this document. BibTeX | Add To MetaCart. @MISC{Konopnicki99towardautomated, author = {David Konopnicki and Lior Leiba and ... 307, S.: On a theory of computation and complexity over the real numbers Blum, Shub, et al. - 1989. ...
WebSuite: A Tool Suite for Harnessing Web Data
We present a system for searching, collecting, and integrating Web-resident data. The system cons... more We present a system for searching, collecting, and integrating Web-resident data. The system consists of five tools, where each tool provides a specific functionality aimed at solving one aspect of the complex task of using and managing Web data. Each tool can be used in a stand-alone mode, in combination with the other tools, or even in conjunction with other systems. Together, the tools offer a wide range of capabilities that overcome many of the limitations in existing systems for harnessing Web data. The paper describes each tool, possible ways of combining the tools, and the architecture of the combined system.
Draft of W3QS: a query system for the World-Wide Web
W3QS - A System for WWW Querying

Bringing Database Functionality to the WWW
Database Management Systems excel at managing large quantities of data, primarily enterprise data... more Database Management Systems excel at managing large quantities of data, primarily enterprise data. The WWW is a huge heterogeneous distributed database. To support advanced, robust and reliable applications, such as efficient and powerful querying, groupware and electronic commerce, database functionalities need be added to the WWW. A major difficulty is that database techniques were traditionally targeted at a single enterprise environment, providing a centralized control over data and meta-data, statistics for query processing and the ability to utilize monolithic mechanisms for concurrency control, replication and recovery. Previously, we have defined and implemented a query language (W3QL) and a query system for the WWW (W3QS). We dealt with some of the typical problems posed by data management on the WWW: the diversity of data types, the active components (online forms) and the difficulty in defining an adequate data model. In this work we introduce new mechanisms and concepts in order to add database functionalities to the WWW. Namely, a useful abstract model and a blue print of a query language for the WWW, new research directions concerning WWW query processing and the concept of ”data stability”.
A Comprehensive Framework for Querying and Integrating WWW Data and Services
WWW information gathering: The W3QL query language and the W3QS system
ACM Transactions on Database Systems, 1998
WWW Data and Services: Querying, Integration and Automation
A Formal Yet Practical Approach to Electronic Commerce
Page 1. A Formal Yet Practical Approach to Electronic Commerce David Konopnicki,Lior Leiba, Oded ... more Page 1. A Formal Yet Practical Approach to Electronic Commerce David Konopnicki,Lior Leiba, Oded Shmueli Computer Science Dept. Technion Israel Institute of Technology Haifa, Israel fkonop, lior, [email protected] ...
W3QS: A Query System for the World-Wide Web
The World-Wide Web (WWW) is an ever growing, distributed, non-administered, global information re... more The World-Wide Web (WWW) is an ever growing, distributed, non-administered, global information resource. It resides on the world-wide computer network and allows access to heterogeneous information: text, image, video, sound and graphic data. Currently, this wealth of ...
A Formal Yet Practical Approach to Electronic Commerce
International Journal of Cooperative Information Systems, 2002
Page 1. A Formal Yet Practical Approach to Electronic Commerce David Konopnicki,Lior Leiba, Oded ... more Page 1. A Formal Yet Practical Approach to Electronic Commerce David Konopnicki,Lior Leiba, Oded Shmueli Computer Science Dept. Technion Israel Institute of Technology Haifa, Israel fkonop, lior, [email protected] ...

WWW Exploration Queries
The World-Wide Web presents new challenges to database researchers, especially in the area of que... more The World-Wide Web presents new challenges to database researchers, especially in the area of query processing. Currently, querying the World-Wide Web is done by using Online Indices. These sites employ search engines, known as “robots”, that can scan the network periodically and form text based indices. A severe limitation of these search services is that the structural information, namely the organization of documents into parts pointing to each other, is lost. Several tasks, ranging from data mining to Intranet management, require the analysis of the hypertext structural organization. In this paper, we propose s simple graph based query language. In this language, both the query and its target are graphs. We present and evaluate the efficiency of a general class of algorithms for answering graph queries. The algorithms’ definition take into account two important facts of the WWW: (1) efficient algorithms must minimize the communication needed to answer a query and (2) query evaluation involves a process of data graph exploration.
W3QL: A Query Language for the WWW", published in 1995, presented a language with several distinc... more W3QL: A Query Language for the WWW", published in 1995, presented a language with several distinctive features. Employing existing indexes as access paths, it allowed the selection of documents using conditions on semi-structured documents and maintaining dynamic views of navigational queries. W3QL was capable of automatically filling out forms and navigating through them. Finally, in the SQL tradition, it was a declarative query language, that could be the subject of optimization.
Uploads
Papers by David Konopnicki