Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
In a rapidly growing digital world there is the possibility to query and discover data, but the most important issue is what resources are needed and how quickly data can be accessed. For several years ago, the grid systems, cloud systems and distributed database systems have replaced independent databases, because their computing power is much higher. In the case of distributed databases, stored in different nodes of a network, there may be chosen more channels of communication between nodes and therefore different time costs. In this paper a method is presented for selecting optimal routes between the nodes that are distributed to the system, depending on the system parameters, network characteristics, available resources and the volume of data that is to be transferred. Also, a method is shown to improve the time cost for multiple queries in distributed databases using the caching technique. To test and validate the method, a database to a web application was used in order to manage a chain of stores. Several scenarios were created for queries and the execution time for each scenario in part was measured through an interface designed specifically for testing.
Indian Journal of Science and Technology, 2018
Objectives: This paper brings to light different query optimization components and their optimizing functionalities which are helpful to improve the response time of query and the efficiency of distributed database. A cache based optimization is also analyzed to highlight the query optimization process. Methods: As data is the most valuable asset for any organization due to this they want to get access and use it efficiently and in a timely manner. To evaluate the efficiency of query optimization its different components e.g. search space, search strategy and cost model are evaluated with the help of examples, tables and diagrams. By comparing the different results, a cache based optimization technique is also evaluated. Findings: It is observed that in search space generated plans are equivalent in the sense they provide same results but their operation, implementation and performance is different. Different algorithms of search strategy are also examined to get the quicker and accurate results and notice that movement of search strategy is greatly depend upon join ordering and cost model. It is also observed that the cost model is helpful to select the best query execution plan but it depends upon the different parameters for example queue length, sever distance, server capacity and load. The latest cache based query optimization technique is also examined and noted that it is key to improve the response time of query as its computational cost is very low. It will be more helpful if it is placed at each site. Applications and Future Improvements: Currently cache based query optimization is applicable only for homogeneous distributed databases. In future this technique can also be implemented for heterogeneous type of databases.
2014
The query optimizer is a significant element in today’s relational database management system. This element is responsible for translating a user-submitted query commonly written in a non-procedural language-into an efficient query evaluation program that can be executed against the database. This research paper describes architecture steps of query process and optimization time and memory usage. Key goal of this paper is to understand the basic query optimization process and its architecture.
Distributed database is emerging as a boon for large organizations as it provides better flexibility and ease compared to centralized database. As the data is growing over the distributed environment day by day, a better distributed management system is required to manage this large data. Query optimization is a process of finding out better query execution plan from multiple available options. As there a multiple sites in distributed database having parts of the data, query optimization is one of the challenging tasks in distributed database. In this review paper query optimization challenges in distributed database and its basic steps have been studied. And a review of some proposed systems has been done.
Proceedings of the 1983 ACM SIGSMALL symposium on Personal and small computers - SIGSMALL '83, 1983
The current research on optimizing algorithms for queries in distributed data base networks is presented.
Query processing is an important concern in the field of distributed databases. The main problem is: if a query can be decomposed into subqueries that require operations at geographically separated databases, determine the sequence and the sites for performing this set of operations such that the operating cost (communication cost and processing cost) for processing this query is minimized. The problem is complicated by the fact that query processing not only depends on the operations of the query, but also on the parameter values associated with the query. Distributed query processing is an important factor in the overall performance of a distributed database system.
Information Processing Letters, 1980
Execution of Structured Query Language (SQL) queries in optimized way in the distributed database is a hitch that most of the database programmer faces since the inception of database technology. Query optimization in network is one of the hardest problems in the database area. The commercialization and success of database systems is primarily due to the development of complicated query optimization techniques. Database users post their queries in a declarative mode by by means of SQL or Object Query Langua ge (OQL) and the Query Optimizer of the related database system find a best plan to execute the same. The optimizer determines the best indices to be used to execute a query and the order in which the operations of a query should be executed. To achieve t his, the optimizer estimate alternative plans, and also estimate the cost of query plan by means of a cost model, and then selects the plan with lowest cost. There has been much research into this field. In this paper, we will review the difficulty of dist ributed query optimization; and will emphasis on the various components of the query optimizer required in distributed environment, i.e. cost model, search space and search strategy. A review of the existing work in this field is shown and future work is h ighlighted based on recent work that utilizes mobile agent technologies.
Lecture Notes in Computer Science, 2005
Caching can highly improve performance of query processing in distributed databases. In this paper we show how this technique can be used in grid architecture where data integration is implemented by means of updatable views. Views integrate data from heterogeneous sources and provide users with their integrated form. The whole process of integration is transparent, i.e. users need not be aware that data are not located at one place. In data grids caching can be used at different levels of architecture. We focus on caching at the middleware layer where the cache is stored in the database of the integrating unit. These results can be then used while answering queries from grid users, so there will be no need to reevaluate the whole queries. In such a way caching can highly increase performance of applications operating on grid. In the paper we also present an example how a query can be optimized by rewriting to make use of cached results.
Performance Evaluation, 1984
In this paper we briefly present the design of a distributed relational data base system. Then, we discuss experimental observations of the performance of that system executing both short and long commands. Conclusions are also drawn concerning metrics that distributed query processing heuristics should attempt to minimize. Lastly, we comment on architectures which appear viable for distributed data base applications.
For globally expanding organizations, applications generate dynamic workflows with frequent changes in database access models (write, read) at different sites. In those situations, a dynamic process to solve the requests on the site where they were generated is recommended. Statistic data proposed in model will help to determine dynamic histograms of data access. In an unbalanced system of distributed databases, a query process will be influenced by the following parameters: type of query (read, write), locations, the rights of data fragments involved in that request. After this process occurs, the result will be sent to the user. In this paper, a heuristic algorithm for solving queries that implies data fragments from other nodes different that the node on which the query was initiated is proposed. For solving a query the algorithm will determine the node with the best time response. In the process of solving a query, the algorithm will ensure accuracy of the data.
IEEE Transactions on Computers, 2000
A model is developed for determining the optimal policy for processing a given relational model query. The model is based on operating cost (processing cost and communication cost), which is a function of selection of sites for processing query operations, sequence of operations, file size, and data reduction functions. The optimal policy specifies the site selection and sequence of operations that yield minimum operating cost. The query is first decomposed into a set of relational algebra operations whose precedence relationships are expressed as a query tree. Additional query trees may be generated by permuting these operations. A set of query processing graphs is then generated for a given query tree. Each node of a query processing graph represents the execution of a set of operations at a single site. Since the neighboring nodes represent distinct processing sites, the arcs between nodes represent the communication cost among sites. Theorems based on the cost model and the query processing graphs are developed for determining the optimal sites for processing the operations and for selecting the local optimal graphs from the set of query processing graphs. Use of these theorems greatly reduces the computation requirements in determining the optimal query processing policy. An example is given to illustrate the model. Index Terms-Distributed database, local operation group, optimal query processing, query operating cost, query processing graph, query tree, relational algebra, relational database.
Computing, Information Systems, Development Informatics & Allied Research Journal, 2016
Optimizing query processing in distributed database system is an important research area considering the volume of data and information being processed these days. Many techniques have been proposed for optimizing query processing in distributed databases. In this paper, we proposed a combination of two of the most commonly used techniques for optimizing query: datashipping and query-shipping techniques. This hybridized technique provides solution for storing and processing data and information for quick retrieval of information in a distributed database when the data to be retrieved are not located within a single computer system. Using this technique, each employee in an organization that is geographically located in different regions can decide to hide information about any of the employee whose data is in the organization's database by preserving their individual query intent. In the client machine, the data-shipping technique is used to help in the local processing and communication cost while the query-shipping technique is used in the server machine to optimize the given data selection. In some cases data management (update) can also be carried out by the database administrator thus the technique combines the features of both the data-shipping and query-shipping techniques. We compared the results obtained with previous results when the techniques were separately used and we discovered that our technique performs better in terms of the time and complexities of the algorithm used.
This paper addresses the processing of a query in distributed database systems using a sequence of semijoins. The objective is to minimize the intersite data traffic incurred by a distributed query. A method is developed which accurately and efficiently estimates the size of an intermediate result of a query. This method provides the basis of the query optimization algorithm. Since the distributed query optimization problem is known to be intractable, a heuristic algorithm is developed to determine a low-cost sequence of semijoins. The cost comparison with an existing algorithm is provided. The complexity of the main features of the algorithm is analytically derived. The scheduling time for sequences of semijoins is measured for example queries using the PASCAL program which implements the algorithm.
Query optimization is one of the essential problems in centralized and distributed database. The data allocation to different sites is proposed in a distributed DMS(Database Management System) before a query in order to decrease, the next communicative costs namely an optimized bed production which is of ‘NP’ issues. In this article, it was attempted to examine both the methods to allocate data and produce optimized design in a distributed system and the space to query for query optimization in the distributed environment and show the need concerning optimization method in view of different aspects of optimization process. We install a new method for optimization in distributed database environment which indicates somehow our simple optimization design is executed relatively well until the database design is physical
This paper strongly emphasizes the approach for query optimization which is a frame work model for distributed computing environment systems. We have two popular methods for query optimization. One which is traditional which follows the stages like query planning, deployment and adaptation. The second one which is our main experimental approach which follows the stages like query planning and deployment together as a single stage followed by adaptation [1]. The approach of integration of planning and deployment while writing for distributed queries which involve many sub-queries in distributed data stream systems and applications. This method makes use of hierarchical network partitions which provides operator level-reuse which utilizing network characteristics to maintain an appropriate search space during query planning and deployment. The approach has been practically experimented and proved its efficiency over the traditional methods.
The query optimization problem in large-scale distributed databases is NP nature and difficult to solve. The complexity of the optimizer increases as the number of relations and number of joins in a query increases. being carried out to find an appropriate algorithm to seek an optimal solution especially when the size of the database increases. Various Optimization Strategies have been reviewed in this paper and the studies show that the performance of distributed query optimization is improved when Ant Colony Optimization Algorithm is integrated with other optimization algorithms.
2012
Query optimization in distributed databases is explicitly needed in many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing cost based optimization. This not only increases the cost of optimization, but also affects the trade-offs involved in the optimization process significantly. The leading cost in this optimization process is the cost of costing that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the under-lying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication. In this paper, a cache based query optimization model has been proposed which shows better hit ratio even for the initial queries made since local cache has been used instead of global cache. A cache is implanted between the local optimizer and local database. Whenever a qu...
Journal of Heuristics, 1997
The query optimizer is the DBMS (data base management system) component whose task is to find an optimal execution plan for a given input query. Typically, optimization is performed using dynamic programming. However, in distributed execution environments, this approach becomes intractable, due to the increase in the search space incurred by distribution. We propose the use of the tabu search metaheuristic for distributed query optimization. A hashing-based data structure is used to keep track of the search memory, simplifying significantly the implementation of tabu search. To validate this proposal, we implemented the tabu search strategy in the scope of an existing optimizer, which runs several search strategies. We focus our attention on the more difficult problems in terms of the query execution space, in which the solution space includes bushy execution plans and Cartesian products, which are not dealt with very often in the literature. Using a real-life application, we show the effectiveness of tabu search when compared to other strategies.
I would like to thank my supervisor Dr Dan Olteanu for his incredible level of enthusiasm and encouragement throughout the project. I am also very grateful for the continuous level of feedback and organisation as well as the amount of time he has devoted to answering my queries. I feel that I now approach complex and unknown problems with enthusiasm instead of apprehension as I used to. I couldn't have had a better supervisor.
2009
Query processing is an important concern in the field of distributed databases. The main problem is: if a query can be decomposed into subqueries that require operations at geographically separated databases, determine the sequence and the sites for performing this set of operations such that the operating cost (communication cost and processing cost) for processing this query is minimized. The problem is complicated by the fact that query processing not only depends on the operations of the query, but also on the parameter values associated with the query. Distributed query processing is an important factor in the overall performance of a distributed database system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.