Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
AI
The paper discusses query acceleration techniques in distributed database systems, focusing on the challenges posed by data decentralization and the need for efficient data retrieval methods. It introduces a performance algorithm designed to minimize communication costs and enhance data access time, presenting empirical results that demonstrate improvements over traditional unoptimized approaches.
2009
Query processing is an important concern in the field of distributed databases. The main problem is: if a query can be decomposed into subqueries that require operations at geographically separated databases, determine the sequence and the sites for performing this set of operations such that the operating cost (communication cost and processing cost) for processing this query is minimized. The problem is complicated by the fact that query processing not only depends on the operations of the query, but also on the parameter values associated with the query. Distributed query processing is an important factor in the overall performance of a distributed database system.
Query processing is an important concern in the field of distributed databases. The main problem is: if a query can be decomposed into subqueries that require operations at geographically separated databases, determine the sequence and the sites for performing this set of operations such that the operating cost (communication cost and processing cost) for processing this query is minimized. The problem is complicated by the fact that query processing not only depends on the operations of the query, but also on the parameter values associated with the query. Distributed query processing is an important factor in the overall performance of a distributed database system.
I would like to thank my supervisor Dr Dan Olteanu for his incredible level of enthusiasm and encouragement throughout the project. I am also very grateful for the continuous level of feedback and organisation as well as the amount of time he has devoted to answering my queries. I feel that I now approach complex and unknown problems with enthusiasm instead of apprehension as I used to. I couldn't have had a better supervisor.
Computing, Information Systems, Development Informatics & Allied Research Journal, 2016
Optimizing query processing in distributed database system is an important research area considering the volume of data and information being processed these days. Many techniques have been proposed for optimizing query processing in distributed databases. In this paper, we proposed a combination of two of the most commonly used techniques for optimizing query: datashipping and query-shipping techniques. This hybridized technique provides solution for storing and processing data and information for quick retrieval of information in a distributed database when the data to be retrieved are not located within a single computer system. Using this technique, each employee in an organization that is geographically located in different regions can decide to hide information about any of the employee whose data is in the organization's database by preserving their individual query intent. In the client machine, the data-shipping technique is used to help in the local processing and communication cost while the query-shipping technique is used in the server machine to optimize the given data selection. In some cases data management (update) can also be carried out by the database administrator thus the technique combines the features of both the data-shipping and query-shipping techniques. We compared the results obtained with previous results when the techniques were separately used and we discovered that our technique performs better in terms of the time and complexities of the algorithm used.
IEEE Transactions on Computers, 2000
A model is developed for determining the optimal policy for processing a given relational model query. The model is based on operating cost (processing cost and communication cost), which is a function of selection of sites for processing query operations, sequence of operations, file size, and data reduction functions. The optimal policy specifies the site selection and sequence of operations that yield minimum operating cost. The query is first decomposed into a set of relational algebra operations whose precedence relationships are expressed as a query tree. Additional query trees may be generated by permuting these operations. A set of query processing graphs is then generated for a given query tree. Each node of a query processing graph represents the execution of a set of operations at a single site. Since the neighboring nodes represent distinct processing sites, the arcs between nodes represent the communication cost among sites. Theorems based on the cost model and the query processing graphs are developed for determining the optimal sites for processing the operations and for selecting the local optimal graphs from the set of query processing graphs. Use of these theorems greatly reduces the computation requirements in determining the optimal query processing policy. An example is given to illustrate the model. Index Terms-Distributed database, local operation group, optimal query processing, query operating cost, query processing graph, query tree, relational algebra, relational database.
The query optimization problem in large-scale distributed databases is NP nature and difficult to solve. The complexity of the optimizer increases as the number of relations and number of joins in a query increases. being carried out to find an appropriate algorithm to seek an optimal solution especially when the size of the database increases. Various Optimization Strategies have been reviewed in this paper and the studies show that the performance of distributed query optimization is improved when Ant Colony Optimization Algorithm is integrated with other optimization algorithms.
Query is a statement or group of statement that adequately execute some basic database operations viz. " Read " , " Write " , " Delete " , and " Update ". It plays a consequential role in managing and retrieving data. In general, distributed queries are more complex and complicated as compared to centralized queries. Queries can be categorized as data creation and data destruction, Data management queries, Data control quarry, OLTP and DSS quarries. In data creation and data destruction quarries create, insert and drop quarries are used. In data management quarry data is managed and manipulate, data can be insert, delete and update. In data control query, one can save data using commit command; permission can be granted using grant command [1][2][3]. In online transaction processing (OLTP) the work analysis and query optimization is done. In decision support system (DSS) queries used to retrieve data from large database. The execution time is not predictable in DSS query. Decision support system (DSS) queries are more complex as compare to online transaction processing queries (OLTP). The running time of DSS queries are unpredictable as compare to OLTP. The process of optimization in Decision support system (DSS) queries is complex as compare to OLTP queries. A distributed DSS query is used to retrieve data from multiple sites. In online transaction processing system (OLTP); real updates are performed. However, DSS queries execute batches as compared to real time updates. Online transaction processing (OLTP) database applications are optimal for managing changing data; these applications typically have many users who are performing transaction at the same time that change real time data, in other words OLTP is a live database. On other side the tables in a decision support database are heavily indexed and the raw data is frequently preprocessed and organized to support various types of queries to be used. The OLTP and DSS queries can be differentiated on the basis of different parameters as mentioned below [1][4][5][6]: A number of heuristics have been applied in recent times, which proposed new algorithms for substantially improving the performance of a query[1][2][3]. As stated by Manik Sharma et al. (2015) there are two major types of database queries called DSS and OLTP queries. To optimize a DSS query on the basis of usage of system resources, one has to find an optimal query execution plan which minimizes the Total Costs of a query. For finding the optimal query execution plan, the costs of
IJRASET, 2021
The fundamental goal of this postulation is to introduce various models for single also as numerous inquiry handling in the Distributed data set framework which brings about less question handling cost. One of the significant issues in the plan and execution of Distributed Information Base Management Systems (DDBMS) is productive inquiry handling. The objective of dispersed inquiry improvement decreases to minimization of measure of information to be communicated among destinations for handling a given inquiry. The issue of question handling in DDBS (1 1) has been concentrated broadly in writing. In the greater part of calculations, the capability of the question will contain a grouping of tasks. In such cases, while executing tasks from right to left, as per the request for tasks in arrangement, the aftereffect of an activity might be an operand to the next activity. Since the tasks are subject to each other, at a moment in particular one activity at one site will be executed despite the fact that the climate is dispersed. Then frameworks at any remaining locales will be inactive for this inquiry. Another model, Totally Reducible Relation Model (CRK Medel), which permits parallelism and processes numerous tasks all the while at all important locales is introduced. It is expected that the tasks are in the type of conjunctions. So every activity can be handled freely. In this model at some moment, relations at every single significant site will be totally diminished by relating sets of every appropriate activity (Determinations, Semijoins and Joins) all the while. Thus, every connection will be checked just a single time to deal with all appropriate tasks by decreasing VO cost.
2007
Distributed database system technology is one of the major developments in information technology area. It will continue to have a very significant impact on data processing in the upcoming years because distributed database systems have many potential advantages over centralized systems for geographically distributed organizations. The continuing interest in distributed database systems in the research community and the marketplace and the introduction of many commercial products indicate that distributed database systems will play a more important role in data processing and eventually will replace centralized systems as the major database technology in the future. The availability of high speed communication networks and, especially, the phenomenal popularity of the Internet and the intranets will undoubtedly speed up the transition process. Some challenging problems must be solved before the full potential benefits of distributed database technology can be realized. Among them is query processing (including query optimization), one of the most important issues in distributed database system design. The query optimization problem in large-scale distributed databases is NP-hard in nature and difficult to solve. In this study, the query optimization problem is reduced to a join ordering problem similar to a variant of traveling salesman problem. We explored several heuristics and a genetic algorithm for solving the join ordering problem. Some computational experiments on these algorithms were conducted and solution qualities compared. The computation experiments show that heuristics and genetic algorithms are viable methods for solving query optimization problem in large scale distributed database systems. 262 issues related to the problem, to model the problem, taking into consideration the most important factors, to propose some solution methods for these models, and, finally, to conduct computational experiments and compare the results to determine the effectiveness and efficiency of the solution techniques (algorithms). We believe that the development of the comprehensive models for the query optimization in large-scale systems, as well as finding effective and/or efficient solution techniques to solve the problems that have been identified are important and will contribute to the use of and research on distributed database technology.
Query processing in distributed databases involves the transfer of query from one site to another. As a result of this complexity, additional storage space and time may be needed, which could result in cost overhead since cost is often associated with query execution especially in distributed database. These costs could arise from input/output (I/O), CPU used at each site, cost of transferring data between sites, etc. With increasing volume and complexities of data and information been processed in large databases these days, it becomes very paramount to have a query execution strategy and optimization technique for efficient retrieval of information. In order to minimize the response time and resource utilization of systems, it is necessary to optimize query processing in distributed databases. Various techniques have been used in the past but these techniques all require that the user have a good knowledge of the whole data thus making them unsuitable for autonomous distributed databases where nodes are unaware of each other. This paper proposed an object-oriented technique that can be used for processing information with minimum response time and resource usage in distributed databases. The result of our technique actually shows that data can be retrieved with minimal delay.
This paper addresses the processing of a query in distributed database systems using a sequence of semijoins. The objective is to minimize the intersite data traffic incurred by a distributed query. A method is developed which accurately and efficiently estimates the size of an intermediate result of a query. This method provides the basis of the query optimization algorithm. Since the distributed query optimization problem is known to be intractable, a heuristic algorithm is developed to determine a low-cost sequence of semijoins. The cost comparison with an existing algorithm is provided. The complexity of the main features of the algorithm is analytically derived. The scheduling time for sequences of semijoins is measured for example queries using the PASCAL program which implements the algorithm.
2007
Selecting the best plan for executing a given query is the problem of query optimization. The focus of query optimization for sequential machines has been on nding query plans which involve the least amount of work, since response time is equivalent to work done in a uniprocessor environment. With the advent of parallel computers and their application to data management, this is no longer true. It may not be the case that the best sequential plan will result in the best parallel plan, since sequential dependencies in certain plans make them inherently less parallelizable. It is thus possible to reduce the response time of a query by selecting a plan which may do more work but is also more parallelizable. I/O has traditionally been a bottleneck for query processing, and the di erence in I/O and CPU speeds is increasing with modern technology. Since join processing is very CPU intensive, maximizing overlap between I/O, and CPU resource utlization is desirable. Researchers have looked at Asynchronous I/O as a tool to achieve this e ect. In this paper we study the problem of parallel join execution on a shared nothing architecture with support for asynchronous I/O, and asynchronous message passing. We examine the tradeo s among di erent query execution models and propose a viable alternative. An analytical cost model for the proposed query execution model is developed. Tradeo s among di erent query plan structures are explored, and query domains where one structure does better than the other are categorized. Based on this we propose a heuristic approach which tries to achieve the best of all worlds, and gives a stable, and good average performance across most domains.
This manuscript has been reproduced from the mictdilm master. UMI films the text directly from the original or oopy submitted-Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.
Journal of Heuristics, 1997
The query optimizer is the DBMS (data base management system) component whose task is to find an optimal execution plan for a given input query. Typically, optimization is performed using dynamic programming. However, in distributed execution environments, this approach becomes intractable, due to the increase in the search space incurred by distribution. We propose the use of the tabu search metaheuristic for distributed query optimization. A hashing-based data structure is used to keep track of the search memory, simplifying significantly the implementation of tabu search. To validate this proposal, we implemented the tabu search strategy in the scope of an existing optimizer, which runs several search strategies. We focus our attention on the more difficult problems in terms of the query execution space, in which the solution space includes bushy execution plans and Cartesian products, which are not dealt with very often in the literature. Using a real-life application, we show the effectiveness of tabu search when compared to other strategies.
Journal of Institute of Science and Technology, 2019
Query optimization is the most significant factor for any centralized relational database management system (RDBMS) that reduces the total execution time of a query. Query optimization is the process of executing a SQL (Structured Query Language) query in relational databases to determine the most efficient way to execute a given query by considering the possible query plans. The goal of query optimization is to optimize the given query for the sake of efficiency. Cost-based query optimization compares different strategies based on relative costs (amount of time that the query needs to run) and selects and executes one that minimizes the cost. The cost of a strategy is just an estimate based on how many estimated CPU and I/O resources that the query will use. In this paper, cost is considered by counting number of disk accesses for each query plan because disk access tends to be the dominant cost in query processing for centralized relational databases.
Query optimization is one of the essential problems in centralized and distributed database. The data allocation to different sites is proposed in a distributed DMS(Database Management System) before a query in order to decrease, the next communicative costs namely an optimized bed production which is of ‘NP’ issues. In this article, it was attempted to examine both the methods to allocate data and produce optimized design in a distributed system and the space to query for query optimization in the distributed environment and show the need concerning optimization method in view of different aspects of optimization process. We install a new method for optimization in distributed database environment which indicates somehow our simple optimization design is executed relatively well until the database design is physical
— As the database management field has diversified to consider settings in which queries are increasingly complex, statistics are less available, or data is stored remotely, there has been an acknowledgment that the available optimization techniques are insufficient. This has led to a plethora of new techniques, generally placed under the common banner of optimizing complex queries that focus on rewriting the complex queries in a simple manner. Query optimization is the bottleneck of database application performance especially those which store history i.e. data warehouse. SQL is used as query language because most data warehouses are based on relational or extended relational database system. In this survey paper, we identify many of the common issues, themes, and approaches that pervade this work, and the settings in which each piece of work is most appropriate. Our goal with this paper is to be a " value-add " over the existing papers on the material, providing not only a brief overview of each technique, but also a basic framework for understanding the field of query processing in general and also to reduce the complexity of the queries to enhance the query processing and optimization engines.
Information Processing Letters, 1980
Encyclopedia of Database Systems, 2009
I would like to thank my supervisor Dr Dan Olteanu for his incredible level of enthusiasm and encouragement throughout the project. I am also very grateful for the continuous level of feedback and organisation as well as the amount of time he has devoted to answering my queries. I feel that I now approach complex and unknown problems with enthusiasm instead of apprehension as I used to. I couldn't have had a better supervisor.
Proceedings of the second international conference on Information and knowledge management - CIKM '93, 1993
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.