A possible solution I could think of currently:
- Always choose to use HashJoin when there is no statistical information indicating that both tables are large.
- Memory tracking while building hashtable for building side.
- When the hash-builder fails to grow its memory
3.1. sort and spill the in-memory hashtable into spill0, free memory.
3.2. buffer and sort the incoming records for the buffer table until it's exhausted, do a sort.
3.3. buffer and sort the records for the streaming side until it's finished, do a sort.
3.4 MergeJoin the two sides.