Row Key Designs of NoSQL Database Tables and Their Impact on Write Performance
2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016
In several NoSQL database systems, among which is HBase, only one index is available for the tabl... more In several NoSQL database systems, among which is HBase, only one index is available for the tables, which is also the row key and the clustered index. Using other indexes does not come out of the box. As a result, the row key design is the most important thing when designing tables, because an inappropriate design can lead to detrimental consequences on performances and costs. Particular row key designs are suitable for different problems, and in this paper we analyze the performance, characteristics and applicability of each of them. In particular we investigate the effect of using various techniques for modeling row keys: sequences, salting, padding, hashing, and modulo operations. We propose four different designs based on these techniques and we analyze their performance on different HBase clusters when loading HDFS files with various sizes. The experiments show that particular designs consistently outperform others on differently sized clusters in both execution time and even load distribution across nodes.
Uploads
Papers by Andrea Kulakov