Showing posts with label Bitmap. Show all posts
Showing posts with label Bitmap. Show all posts

Sunday, 13 December 2020

Sorting under constraint algorithms

In the compiler world code inlining is the most important optimization that enables other optimization like dead code elimination, pre fetching , out of order execution etc.

Similarly sorting also enables many optimizations like binary search , streaming aggregation, range search operations, prefix compression, run length encoding , delta encoding, understanding trends, posting list, data partition  etc.



Sorting is memory and compute intensive work and many times we don't have enough compute/memory to do it.

In this post I will share 2 sorting algorithms that can be used when a system has memory or CPU constraint.

Disk based sorting

We have file containing 1 TB data and we want to sort it. Data is huge to it is not possible to use standard in-memory sorting algorithm for this. 

One way to handle sorting of such data is split it in chunks, sort the chunk in memory, persist chunk to disk and finally merge the sorted chunk using k way merge algorithm.


At high level sort pipeline will look something like below 



Nice thing about this algorithm is that it is Embarrassingly_parallel.

This algorithm is also good example of Divide-and-conquer_algorithm and this technique can be applied both the stages.

This algorithm has 2 stages in the pipeline that can be executed in parallel to take advance of multiple cores.

Lets assume that input file contains 10 Million then it can be decomposed in Split stage


In merge stage we have to do reverse operations of taking multiple files and creating single one.




Split & Merge has different types of compute/disk requirement and it is possible to make both the stage parallel or just one based on constraint.

Overall sort pipeline will look like below.  



This algorithm is used by many databases to manage result sets that can't be fit into memory. 

Important logic in this algorithm is K-Way merge of sorted results. If K is 2 then it is straight forward.

2 Way merge

Merge process is pick head from both iterators and add the value that is less, move pointer of iterator whose value was added.

Need to handle some edge conditions to avoid buffer overflow while reading and handling iterators of different size.


v1.next();
v2.next();

while (v1.hasNext() && v2.hasNext()) {
value1 = v1.value();
value2 = v2.value();
if (isLessThan(value1, value2)) {
buffer.add(value1);
v1.next();
} else {
buffer.add(value2);
v2.next();
}
}

if (v1.hasNext()) {
append(buffer, v1);
} else if (v2.hasNext()) {
append(buffer, v2);
}

K Way merge

Assume K is 4, then one way to merge is to split the whole list in pairs of 2, keep merging in pairs and finally start merge out of 2 way merges. This is a good algorithm but can't take advantage of batching multiple iterators.

Recommended way is to use Heap of K values. This is more efficient as we can process multiple inputs in a single pass and can reduce IO overhead also. 

PriorityQueue<LineIterator> heap=...
LineIterator itr;

while ((itr = heap.poll()) != null) {
write(writer, itr.value());
itr.next();
if (itr.hasNext()) {
heap.add(itr);
}
}

BitMap Sorting

Bitmap is a powerful data structure for searching and has some interesting properties for sort also.

Consider a scenario where the file contains n positive integer and each value is less than K.

K can be really huge depending on max value, to put this in context just by using 256 MB memory billions of int values can be sorted.

Idea is based around an allocated array with every element of K word (i.e 32 or 64). If we used 32 bit words then 32 values can be stored in every slot. Total capacity of this data structure is 32 * len(array).

Setting bit needs 2 information, slot in array and position in that slot.




Bit fiddling enables to pack multiple values in a single word, you want to read more on bit fiddling then refer to bit-fiddling.

In this example bytes is 4 and word size is 32.

Checking for value is straightforward and it involves doing bit wise & on Slot value.

Complete working code 

public static final int NO_OF_BYTE = 4;
private final int WORD_SIZE = 8 * NO_OF_BYTE;
private final int SLOT_SHIFT = NO_OF_BYTE + 1;
private final int[] values;
private final int maxValue;

public BitMapSort(int maxValue) {
this.maxValue = maxValue;
this.values = new int[1 + maxValue / WORD_SIZE];
logUsageInfo(maxValue);
}


public void set(int v) {
this.values[slot(v)] |= position(v);
}

public boolean check(int v) {
int value = this.values[slot(v)] & position(v);
return value != 0;
}

public int position(int v) {
return 1 << (v & WORD_SIZE - 1);
}

public int slot(int v) {
return v >> SLOT_SHIFT;
}


Whole pipeline will look something like below


Trade Off


Nothing is perfect and this also has some constraints and it is good to be aware of it.

- This is a dense value data structure, so if we have to store a value that is of 100 Million then we have to allocate at least 100 million bits ( 95 MB). If values are sparse then find alternate data structure. 

- Thread safety has to be handled at slot level because 32 values are packed in a single slot.

- Values should be distinct but if duplicate values are present and it is going to be less duplicates then additional data structures like maps can be used to keep frequency count. This needs to be handled in a little intelligent way like having some threshold on duplicate value and if it crosses that threshold then it is better to stop accepting value to avoid having everything going to map.

- Iteration. Since this is a compressed representation of dense value, iteration on available value has to be handled in a streaming approach to avoid allocation of huge in memory collection. One of the approach could be having API for consuming single value at a time and let client to decide on what to do with those values, example of such iteration could look something like below

public void consume(IntConsumer consumer) {
IntStream
.range(1, maxValue)
.filter(this::check)
.forEach(consumer::accept);
}

- Range iteration. This data structure is very good for range query.

- Compact set. This is also good DS for set related operations.

Conclusion

These are simple and yet very powerful algorithms and if this fits the bill then it can be the difference between solving the problem or not solving at all. 

Thursday, 16 July 2015

BitMap sorting

Data structure is most important building block of any algorithm and it makes algorithm efficient.

Bitmap is very powerful data structure and is good fit for many problems, it is used for indexing in many database.

BitMap can be also used for sorting and for some specific case bitmap sorting can out perform may famous sorting algorithm with very low memory footprint.

Take a case that you have to sort input file that has lots of number between 1 to 10 million by and using low memory footprint and for simplicity sake assume that their are no duplicates.

What are the ways to solve this problem ?

- Classic MergeSort
This is the first solution that comes to mind whenever sorting is discussed, this will produced many intermediate files and then those will be merged.

- Muti Read & Sort
In this approach input file is read multiple times for range of data, for eg first pass will read values starting from 0 to 250K and then 250K to 500K and so on.

This solution will take less memory but too much of I/O will make it very slow .

- Amazing BitSort
Bit Sorting is really good fit for this problem, allocate bit array and turn on the bits at value index and that's it, values will get sorted as it is added to this data structure .

For eg  
{1,2,5,6,7,11} value can be represented as below

Implementation of such type of data structure is simple in C/C++ but for java bitwise manipulation is required, so to keep it simple i have implemented it using boolean array.

Code snippet 


  
It is very simple to implement this powerful data structure and little trick is required for iteration over value.

Code snippet for iteration


Conclusion
Very useful data structure with low memory foot print & very fast sorting, but it has some trade off  and you should be aware of it

- If numbers of items are less then this might not give any benefit.
- If input values are sparse then lot of space is left unused because of fragmentation.

Code is available @ github