Monday, May 18, 2020

Avoiding reads in write-optimized index structures

This is a summary of features that avoid storage read IO in OSS write-optimized index structures. My scope is limited to MySQL and Tarantool -- I know there are similar features in other DBMS and am happy for comments on that:
  • RocksDB merge operator logically does read-modify-write but physically does a Put into the memtable and the read is deferred until user queries or compaction. I hope that MyRocks eventually takes advantage of the merge operator.
  • Read free replication in TokuDB and MyRocks.
  • Fast Updates in TokuDB do something similar to the merge operator for update and insert
  • Non-unique secondary index maintenance is read free in MyRocks
  • Deferred secondary index update in Tarantool is a new approach to avoiding reads for write-heavy workloads. It is worth reading. I wish it had been published by VLDB.
  • MyRocks also avoids some storage reads when validating unique constraints (for PK and unique secondary indexes) on inserts and some updates because the Get() that it does can use a bloom filter. This benefit is larger when there is a bloom filter on the max level of the LSM tree, but that has other costs is is frequently not done.
  • MyRocks uses the SingleDelete feature in RocksDB to allow tombstones to be removed earlier which decreases the CPU overhead for reads after a batch of writes.
  • A b-tree can also reduce index maintenance related storage read IO. See the InnoDB change buffer.
Updates

An incomplete list of relevant research:
  • DiffIndex - this is more about distributed systems, but still interesting
  • Techniques for reducing index maintenance overhead in an LSM by Luo & Carey (VLDB 2019)
  • Deferred Lightweight Indexing (DELI)

5 comments:

  1. The algorithm described by Tarantool is the nearly identical as the previously published EDBT paper at https://tristartom.github.io/docs/edbt14.pdf. We also had a paper at VLDB 2019 that uses a similar idea but exploits a primary key index to cleanup secondary indexes more efficiently (http://www.vldb.org/pvldb/vol12/p531-luo.pdf).

    ReplyDelete
    Replies
    1. Thanks for interesting links. I only glanced quickly, but isn't the EDBT14 paper a tradeoff: You can choose synchronously updated secondary indexes (requires a read) or you can avoid the read and use eventually consistent secondary indexes?

      Delete
    2. Do we need a RUM Conjecture for dist sys?

      Delete
  2. Sorry that I posted the wrong link for the first paper. The correct one should be Deferred Lightweight Indexing for Log-Structured Key-Value Stores (https://ieeexplore.ieee.org/document/7152467)

    ReplyDelete

Sysbench for MySQL 5.6 through 9.5 on a 2-socket, 24-core server

This has results for the sysbench benchmark on a 2-socket, 24-core server. A post with results from 8-core and 32-core servers is here . tl;...