Make leaf node iteration happen more in parallel.

Right now, in `btree_depth_first_traversal`, the values of a leaf node are processed in a for loop.  They could be processed using pmap or some other mechanism, instead, which would make a lot of rget queries go a ton faster.

(Of course, do we have enough memory to process _all_ the values of a leaf node simultaneously?  How can we deal with that?)