Multi-core processing allows us to utilize as much as possible of the CPU time, essentially making the disk access the bottleneck. This is implemented in the `concurrent_processing` branch, which should be merge into master. The implementation supports choosing the number of threads, and we can even add a rate-limiter to throttle the throughput as well.