Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998
…
12 pages
1 file
In a recent paper, we proposed adding a STOP AFTER clause to SQL to permit the cardinality of a query result to be explicitly limited by query writers and query tools. We demonstrated the usefulness of having this clause, showed how to extend a traditional cost-based query optimizer to accommodate it, and demonstrated via DB2-based simulations that large performance gains are possible when STOP AFTER queries are explicitly supported by the database engine. In this paper, we present several new strategies for efficiently processing STOP AFTER queries. These strategies, based largely on the use of range partitioning techniques, offer significant additional savings for handling STOP AFTER queries that yield sizeable result sets. We describe classes of queries where such savings would indeed arise and present experimental measurements that show the benefits and tradeoffs associated with the new processing strategies.
Sigmod Record, 1997
In this paper, we study a simple SQL extension that enables query writers to explicitly limit the cardinality of a query result. We examine its impact on the query optimization and run-time execution components of a relational DBMS, presenting two approaches-a Conservative approach and an Aggressive approach-to exploiting cardinality limits in relational query plans. Results obtained from an empirical study conducted using DB2 demonstrate the benefits of the SQL extension and illustrate the tradeoffs between our two approaches to implementing it. © tuples in descending (ascending) order. If the sort directive is none, the Stop operator simply returns the first © tuples from its input stream; the none option is chosen by the optimizer when the Stop operator's input stream is known to already be appropriately sorted. The third parameter to Stop is a Sort Expression. If the sort directive is desc or asc, the Stop operator sorts its input according to this sort expression, which is usually identical to the ordering expression from the ORDER BY clause of the query.
Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04, 2004
Virtually every commercial query optimizer chooses the best plan for a query using a cost model that relies heavily on accurate cardinality estimation. Cardinality estimation errors can occur due to the use of inaccurate statistics, invalid assumptions about attribute independence, parameter markers, and so on. Cardinality estimation errors may cause the optimizer to choose a sub-optimal plan. We present an approach to query processing that is extremely robust because it is able to detect and recover from cardinality estimation errors. We call this approach "progressive query optimization" (POP). POP validates cardinality estimates against actual values as measured during query execution. If there is significant disagreement between estimated and actual values, execution might be stopped and re-optimization might occur. Oscillation between optimization and execution steps can occur any number of times. A re-optimization step can exploit both the actual cardinality and partial results, computed during a previous execution step. Checkpoint operators (CHECK) validate the optimizer's cardinality estimates against actual cardinalities. Each CHECK has a condition that indicates the cardinality bounds within which a plan is valid. We compute this validity range through a novel sensitivity analysis of query plan operators. If the CHECK condition is violated, CHECK triggers re-optimization. POP has been prototyped in a leading commercial DBMS. An experimental evaluation of POP using TPC-H queries illustrates the robustness POP adds to query processing, while incurring only negligible overhead. A case-study applying POP to a real-world database and workload shows the potential of POP, accelerating complex OLAP queries by almost two orders of magnitude.
International Journal of Interactive Multimedia and Artificial Intelligence, 2019
Nowadays, current information systems are so large and maintain huge amount of data. At every time, they process millions of documents and millions of queries. In order to choose the most important responses from this amount of data, it is well to apply what is so called early termination algorithms. These ones attempt to extract the Top-K documents according to a specified increasing monotone function. The principal idea behind is to reach and score the most significant less number of documents. So, they avoid fully processing the whole documents. WAND algorithm is at the state of the art in this area. Despite it is efficient, it is missing effectiveness and precision. In this paper, we propose two contributions, the principal proposal is a new early termination algorithm based on WAND approach, we call it MWAND (Modified WAND). This one is faster and more precise than the first. It has the ability to avoid unnecessary WAND steps. In this work, we integrate a tree structure as an index into WAND and we add new levels in query processing. In the second contribution, we define new fine metrics to ameliorate the evaluation of the retrieved information. The experimental results on real datasets show that MWAND is more efficient than the WAND approach.
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06, 2006
Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably "good enough" set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries.
2008 IEEE 24th International Conference on Data Engineering, 2008
Query performance in current systems depends significantly on tuning: how well the query matches the available indexes, materialized views etc. Even in a well tuned system, there are always some queries that take much longer than others. This frustrates users who increasingly want consistent response times to ad hoc queries. We argue that query processors should instead aim for constant response times for all queries, with no assumption about tuning. We present Blink, our first attempt at this goal, that runs every query as a table scan over a fully denormalized database, with hash group-by done along the way. To make this scan efficient, Blink uses a novel compression scheme that horizontally partitions tuples by frequency, thereby compressing skewed data almost down to entropy, even while producing long runs of fixed-length, easily-parseable values. We also present a scheme for evaluating a conjunction of range and equality predicates in SIMD fashion over compressed tuples, and different schemes for efficient hash-based aggregation within the L2 cache. A experimental study with a suite of arbitrary single block SQL queries over a TPCH-like schema suggests that constant-time queries can be efficient.
Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05, 2005
Many analysis and monitoring applications require the repeated execution of expensive modeling functions over streams of rapidly changing data. These applications can often be expressed declaratively, but the continuous query processors developed to date are not designed to optimize queries with expensive functions. To speed up such queries, we present CASPER: the CAching System for PrEdicate Result ranges. CASPER computes and caches predicate result ranges, which are ranges of stream input values where the system knows the results of expensive predicate evaluations. Over time, CASPER expands ranges so that they are more likely to contain future stream values. This paper presents the CASPER architecture, as well as algorithms for computing and expanding ranges for a large class of predicates. We demonstrate the effectiveness of CASPER using a prototype implementation and a financial application using real bond market data.
… of the Second Biennial Conference on …, 2005
A great deal of work on adaptive query processing has been done over the last few years: Adaptive query processing has been used to detect and correct optimizer errors due to incorrect statistics or simplified cost metrics; it has been applied to long-running continuous queries over data streams whose characteristics vary over time; and routing-based adaptive query processing does away with the optimizer altogether. Despite this large body of interrelated work, no unifying comparison of adaptive query processing techniques or systems has been attempted; we tackle this problem. We identify three families of systems (plan-based, CQ-based, and routingbased), and compare them in detail with respect to the most important aspects of adaptive query processing: plan quality, statistics monitoring and re-optimization, plan migration, and scalability. We also suggest two new approaches to adaptive query processing that address some of the shortcomings revealed by our in-depth analysis:
2017
In this work, we present two experimental metrics named dief@t and dief@k which are able to capture and quantify the behavior of any system that produces results incrementally. We demonstrate the effectiveness of dief@t and dief@k on a generic SPARQL query engine able to produce results incrementally. Attendees will observe how both metrics are able to capture complementary information about the continuous behavior of the studied query engine. Moreover, valuable insights about the engine configurations that allow for continuously producing more answers over time will be observed. The demo is available at http://km.aifb.kit.edu/services/dief-app/.
Foundations and Trends in Databases, 2007
As the data management field has diversified to consider settings in which queries are increasingly complex, statistics are less available, or data is stored remotely, there has been an acknowledgment that the traditional optimize-then-execute paradigm is insufficient. This has led to a plethora of new techniques, generally placed under the common banner of adaptive query processing, that focus on using runtime feedback to modify query processing in a way that provides better response time or more efficient CPU utilization.
2008
Abstract���In most database systems, traditional and stream systems alike, the optimizer picks a single query plan for all data based on the overall statistics of the data. It has however been repeatedly observed that real-life datasets are non-uniform. Selecting a single execution plan may result in a query execution that is ineffective for possibly large portions of the actual data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2007 Ieee Acs International Conference on Computer Systems and Applications, 2007
Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405)
Lecture Notes in Business Information Processing, 2013