Right now, in btree_depth_first_traversal, the values of a leaf node are processed in a for loop. They could be processed using pmap or some other mechanism, instead, which would make a lot of rget queries go a ton faster.
(Of course, do we have enough memory to process all the values of a leaf node simultaneously? How can we deal with that?)