Michael E Flaster

Followers

Following

Co-authors

Public Views

Phone: 2017419194

less

Andy Chow

City University of Hong Kong

Rossitza Goleva

New Bulgarian University

Dileep Kumar G

Adama Science and Technology University

Mozmin Ahmed

North Eastern Regional Institute of Science and Technology

Antonella Ferrara

University of Pavia

José A . Jiménez-Moscoso

Universidad Nacional de Colombia (National University of Colombia)

Carlos Westphall

Universidade Federal de Santa Catarina - UFSC (Federal University of Santa Catarina)

Vahan Agopyan

Universidade de São Paulo

J. Sundberg

KTH Royal Institute of Technology

Asaf Cohen

Ben Gurion University of the Negev

Interests

Uploads

Papers by Michael E Flaster

Hidden text detection for search result scoring

Methods and apparatus for contextual schema mapping of source documents to target documents

Method for maintaining consistency and performing recovery in a replicated data storage system

Query generation using structural similarity between documents

Methods and apparatus for mapping source schemas to a target schema using schema embedding

Methods and Apparatus for User-Guided Inference of Regular Expressions for Information Extraction

Patch panel cover mounted antenna grid for use in the automatic determination of network cable connections using RFID tags

Evaluating website properties by partitioning user feedback

Method and apparatus for enabling authorized and billable message transmission between multiple communications environments

Efficient document clustering

Identifying transient portions of web pages

Identifying transient paths within websites

Starfish: highly-available block storage

Proceedings of the …, 2003

Download

Equivalence class-based method and apparatus for cost-based repair of database constraint violations

USENIX Association Proceedings of the FREENIX Track : 2003 USENIX Annual

In this paper we present StarFish, a highly-available geographically-dispersed block storage syst... more In this paper we present StarFish, a highly-available geographically-dispersed block storage system built from commodity servers running FreeBSD, which are connected by standard high-speed IP networking gear. StarFish achieves high availability by transparently replicating data over multiple storage sites. StarFish is accessed via a host-site appliance that masquerades as a host-attached storage device, hence it requires no special hardware or software in the host computer. We show that a StarFish system with 3 replicas and a write quorum size of 2 is a good choice, based on a formal analysis of data availability and reliability: 3 replicas with individual availability of 99%, a write quorum of 2, and read-only consistency gives better than 99.9999% data availability. Although StarFish increases the per-request latency relative to a direct-attached RAID, we show how to design a highly-available StarFish configuration that provides most of the performance of a direct-attached RAID on...

Download

Information Preserving XML Schema Embedding

A fundamental concern of information integration in an XML context is the ability to embed one or... more A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the first systematic and effective approach to finding information preserving XML mappings.

Download

Coded Replication: A Space-Efficient Technique for Increasing File Availability

Distributed file systems offer a potential increase in file availability through replication of d... more Distributed file systems offer a potential increase in file availability through replication of data. Many previous solutions have had large space requirements to achieve high availability. In this paper, we propose a method of replication that is extremely space efficient and yet provides significantly better availability than Dynamic Voting (the best of the previous methods) for reasonably reliable systems. The method employs Reed-Solomon encoding techniques, permitting each node to hold a small amount of the file, and yet allow reconstruction of the entire file given only a subset of the nodes. This increases availability at the cost of increased processing time, instead of increased disk space. The technique is shown to be flexible both in system resource demands and in the availability provided. 1 Introduction Distributing file storage over several machines can increase file availability. The most common mechanism for increasing availability is replication. Replication introduc...

A cost-based model and effective heuristic for repairing constraints by value modification

Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05, 2005

Data integrated from multiple sources may contain inconsistencies that violate integrity constrai... more Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.

Download

Putting Context into Schema Matching

by Michael E Flaster and Eiman Elnahrawy

Attribute-level schema matching has proven to be an important rst step in developing mappings for... more Attribute-level schema matching has proven to be an important rst step in developing mappings for data exchange, integration, restructuring and schema evolution. In this paper we investigate contextual schema matching, in which selection conditions are as- sociated with matches by the schema matching process in order to improve overall match quality. We dene a general space of matching techniques, and

Download

Exploratory Analysis System for Semi-structured Engineering Logs

Lecture Notes in Computer Science, 2006

Engineering diagnosis often involves analyzing complex records of system states printed to large,... more Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log les. Typically the logs are designed to accommodate the widest debugging needs without rigorous plans on formatting. As a result, critical quantities and ags are mixed with less important messages in a loose structure. Once the system is sealed, the log format is not changeable,

Hidden text detection for search result scoring

Methods and apparatus for contextual schema mapping of source documents to target documents

Method for maintaining consistency and performing recovery in a replicated data storage system

Query generation using structural similarity between documents

Methods and apparatus for mapping source schemas to a target schema using schema embedding

Methods and Apparatus for User-Guided Inference of Regular Expressions for Information Extraction

Patch panel cover mounted antenna grid for use in the automatic determination of network cable connections using RFID tags

Evaluating website properties by partitioning user feedback

Method and apparatus for enabling authorized and billable message transmission between multiple communications environments

Efficient document clustering

Identifying transient portions of web pages

Identifying transient paths within websites

Starfish: highly-available block storage

Proceedings of the …, 2003

Download

Equivalence class-based method and apparatus for cost-based repair of database constraint violations

USENIX Association Proceedings of the FREENIX Track : 2003 USENIX Annual

Download

Information Preserving XML Schema Embedding

Download

Coded Replication: A Space-Efficient Technique for Increasing File Availability

A cost-based model and effective heuristic for repairing constraints by value modification

Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05, 2005

Download

Putting Context into Schema Matching

by Michael E Flaster and Eiman Elnahrawy

Download

Exploratory Analysis System for Semi-structured Engineering Logs

Lecture Notes in Computer Science, 2006

Michael E Flaster

Related Authors

Uploads

Papers by Michael E Flaster

Log In