Academia.eduAcademia.edu

Scaling XML query processing: distribution, localization and pruning

2011

Abstract

Abstract Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically.

Key takeaways

  • With horizontal fragmentation, it is possible to evaluate a query by computing the union of all fragments and then executing a centralized query plan over the result.
  • Since the local plans can be evaluated independently of each other in parallel, we can model the cost of a query q as cost(q) = max{cost(p j ) | p j ∈ P } where P is the set of local plans (after pruning) corresponding to q for a given vertical fragmentation schema.
  • So far, for simplicity, we have focused on identifying a fragmentation schema for a single query.
  • Distributed query execution over the hybrid fragmentation yields even better results.
  • Figure 23 shows the throughput rates achieved by centralized query execution (which is vanishingly low in some of the cases shown), as well as distributed query 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 execution (with and without pruning) on a balanced fragmentation consisting of 2, 4 and 8 fragments and on the skewed fragmentation.