Currently, we create a single runtime instance, call into it once via the Benchmark_dispatch_benchmark, and all the benchmarks of a pallet is executed in one executor instance, with a single heap.
This makes verifying memory usage and safety pretty hard, since a single poor heap is being used (and constantly growing) for all benchmarks.
First, if you have a benchmark that is designed to just check a certain operation can succeed within a number of heap pages, you can't really test this.
Furthermore, if a pallet has a very large number of benchmarks, with relatively high number of components, you might get unexpected failures that one of the benchmarks ran out of heap, while if you run that benchmark in isolation it will succeed.
This will probably make everything very slow, but potentially we can move all the for loops outside of the runtime, and make one new runtime call per-benchmark-component, clearing the heap in between.
Alternatively, we could leave the current benchmarks as they are now, and build a secondary type of benchmarks that does what I just explained.