OOM while traversing large database.
sled 0.31.0 rustc 1.42.0 (b8cedc004 2020-03-09) Ubuntu 19.10 Code:
fn main() {
dotenv::dotenv().ok();
pretty_env_logger::init();
let db = sled::Config::new()
.path("/mount/large/server/sled")
.use_compression(true)
.cache_capacity(256)
.open()
.unwrap();
let wikidata = db.open_tree("wikidata").unwrap();
wikidata.len();
wikidata.len();
}
Expected outcome: Uses a small amount of memory Actual outcome: Uses several gigabytes of memory and gets killed.
The database is 4.7gb
Some debug logs: https://pastebin.com/e49teW5m
This also causes an OOM kill (before disk is exhausted) if let running long enough.
extern crate bincode;
fn main() -> sled::Result<()> {
let db = sled::open("db")?;
let t1 = db.open_tree("t1")?;
let t2 = db.open_tree("t2")?;
let mut bincode = bincode::config();
bincode.big_endian();
// This approximates somewhat my workload...
let infinite_data = (0u64..).flat_map(|i| (0u64..100).map(move |j| (i, j)));
for (i, j) in infinite_data {
let key1 = bincode.serialize(&(i, j)).expect("can serialize");
let key2 = bincode.serialize(&(j, i)).expect("can serialize");
t1.insert(key1, vec![])?;
t2.insert(key2, vec![])?;
}
Ok(())
}
Using sled 0.31 (and also latest master) on Ubuntu 18 and Rust 1.45 (nightly 2020-05-12). C'mon, man! I can try to live with sled using tons of space, but this is a deal breaker in my case (and basically any big data usecase). Hope it's easy to fix, though.
Yo, found a mitigation which might also help solve the problem... I found this comment here:
cache_capacity is currently a bit messed up as it uses the on-disk size of things instead of the larger in-memory representation. So, 1gb is not actually the default memory usage, it's the amount of disk space that items loaded in memory will take, which will result in a lot more space in memory being used, at least for smaller keys and values. So, play around with setting it to a much smaller value.
https://github.com/spacejam/sled/issues/986#issuecomment-592950100
Which led me to fiddle with the cache_capacity knob making it use 100_000 instead of the standard 1_000_000_000. What I have found (qualitatively):
- Memory still seems to balloon out of control, but at a much slower rate.
- Disk write goes down; db gets slower.
- Still, it seems I was able to write the same amount of data to the disk, roughly 10GB, (ok, I know... it's not that simple) using less overall memory.
As you'll see in my code, I set the cache capacity as low as I could without resolving my problem. So I wonder if we are hitting different issues.
Good question... I was reluctant in opening a new issue, though. For context, my issue is to load a big dataset. So it's a write problem, not a read problem. @spacejam has been in contact and told me it's a knwon issue, it seems.
I'm currently looking into this approach for handling this issue: https://github.com/spacejam/sled/issues/1093
@spacejam That ticket references many inserts, which wasn't my issue. My issue is simply traversing a large database. It could be related issues for all I know, but wanted to make sure.