Showing posts with label In-Memory. Show all posts
Showing posts with label In-Memory. Show all posts

Thursday, 27 August 2015

Indexes in Embedded database

Embedded database are very important part of application that has micro/mill second latency requirement,.
Most of the embedded database has very good read performance and has specialized types indexes to support fast read.

Reads is all about having index on field and since we are in the era where "ONE SIZE FITS ALL" theory does not work, so different types of indexes are required for different types of read.

Lets gets cracking on few types of index that is required for fast read.

Hash Index
This is very basic type of index that can be implemented using HashMap, so to build such type index just choose the field and use that as key of HashMap and you are done, very good when lookup is going to based on Key.

Unique Index
It is just another variant of hash index but Set is used to maintain distinct values and this can be used to answer queries that needs distinct values or want to check if value exists or not.

Range Index
Now it starts to become little interesting because criteria is based on range of values for e.g
 - greaterthan XX and lessthan YY
 - greaterthan XX
- lessthan XX

Tree based index comes to rescue , all such type of queries can be answered by B-Tree data structure.

- Starts With or Contains
 This is similar to LIKE type of query in database for e.g name like 'A%' or name like '%A%'
Trie data structure can be used to answer such query.

I have tried to build simple library to support above types of query but i have left concurrency aspect to keep solution simple.

Code is available @ github

Sample application has 3 types of index

Hash - For exact value based lookup on some attribute, code available @ HashDataIndex

Tree - This is useful for range queries for eg query on date field, code available @ TreeDataIndex

Contains - This is useful for doing LIKE based search, suffix Tree data structure is used for this, code available @ ContainsDataIndex

Conclusion
- Memory usage is very important factor for such type of API and java collections are not best at it , so for production type of usage memory efficient collection should be used.

- Storing raw data as java object will not give good performance for big JVM, so having compact/memory efficient representation is required as java heap grows.

-Last one that RAM has limit so it is impossible to keep raw data or index data in memory and that makes it interesting problem to solve, so some kind of partition strategy is required for data & index and only required partition of data is kept online in memory, there are many open source library that allows to store data using memory mapped files.

Saturday, 20 July 2013

ArrayList Using Memory Mapped File

Introduction
In-Memory computing is picking up due to affordable hardware, most of the data is kept in RAM to meet latency and throughput goal, but keeping data in RAM create Garbage Collector overhead especially if you don't pre allocate.
So effectively we need garbage less/free approach to avoid GC hiccups

Garbage free/less data structure
There are couple of option to achieve it
 - Object Pool 
Object pool pattern is very good solution, i wrote about that in Lock Less Object Pool blog

- Off Heap Objects
JVM has very good support for creating off-heap objects. You can get rid of GC pause if you take this highway and highway has its own risk!

-MemoryMapped File
This is mix of Heap & Off Heap, like best of world.

Memory mapped file will allow to map part of the data in memory and that memory will be managed by OS, so it will create very less memory overhead in JVM process that is mapping file.
This can help in managing data in garbage free way and you can have JVM managing large data.
Memory Mapped file can be used to develop IPC, i wrote about that in power-of-java-memorymapped-file blog

In this blog i will create ArrayList that is backed up by MemoryMapped File, this array list can store millions of object and with almost no GC overhead. It sounds crazy but it is possible.

Lets gets in action
In this test i use Instrument object that has below attribute
 - int id
 - double price

So each object is of 12 byte.
This new Array List holds 10 Million Object and i will try to measure writer/read performance

Writer Performance



X Axis - No Of Reading
Y Axis - Time taken to add 10 Million in Ms









Adding 10 Million element is taking around 70 Ms, it is pretty fast.

Writer Throughput
Lets look at another aspect of performance which is throughput


X Axis - No Of Reading
Y Axis - Throughput /Second , in Millions









Writer throughput is very impressive, i ranges between 138 Million to 142 Million

Reader Performance

X Axis - No Of Reading
Y Axis - Time taken to read 10 Million in Ms









It is taking around 44 Ms to read 10 Million entry, very fast. With such type of performance you definitely challenge database.

 Reader Throughput

X Axis - No Of Reading
Y Axis - Throughput /Second , in Millions









Wow Throughput is great it is 220+ million per second

It looks very promising with 138 Million/Sec writer throughput & 220 Million/Sec reader throughput.

Comparison With Array List
Lets compare performance of BigArrayList with ArrayList,

Writer Throughput - BigArrayList Vs ArrayList




 Throughput of BigArrayList is almost constant at around 138 Million/Sec, ArrayList starts with 50 Million and drops under 5 million.

ArrayList has lot of hiccups and it is due to 
 - Array Allocation
 - Array Copy
 - Garbage Collection overhead

BigArrayList is winner in this case, it is 7X times faster than arraylist.

Reader Throughput - BigArrayList Vs ArrayList

ArrayList performs better than BigArrayList, it is around 1X time faster.

BigArrayList is slower in this case because
 - It has to keep mapping file in memory as more data is requested
 - There is cost of un-marshaling

Reader Throughput for BigArrayList is 220+ Million/Sec, it is still very fast and only few application want to process message faster than that.
So for most of the use-case this should work.

Reader performance can be improved by using below techniques 
 - Read message in batch from mapped stream
 - Pre-fetch message by using Index, like what CPU does

By doing above changes we can improve performance by few million, but i think for most of the case current performance is pretty good

Conclusion
Memory mapped file is interesting area to do research, it can solve many performance problem.
Java is now being used for developing trading application and GC is one question that you have to answer from day one, you need to find a way to keep GC happy and MemoryMapped is one thing that GC will love it.

Code used for this blog is available @ GitHub , i ran test with 2gb memory.
Code does't handle some edge case , but good enough to prove the point that that MemoryMapped file can be winner in many case.

Monday, 18 February 2013

Experiment with ConcurrentHashmap

I am investigating memory issue in one of my recent project where data is kept in memory for fast access, but memory footprint of application is very high.
This application was heavily using CHM(i.e Concurrenthashmap), so no brainier guess was required that CHM was the issue. I did memory profiling session to find how much really CHM was taking.
I was surprised with result that CHM was taking around 50% of the memory. So it was confirmed that CHM was the issue, it is fast but not memory efficient.

Why CHM is so fat ?

  • Key/Values are wrapped by Map.Entry object , this create extra object per object. 
  • Each segment is Re-entrant lock, so if you have lot of  small CHM and concurrency level is default then there will be lot of lock object and that will come in top list. 
  • and many more states for house keeping activities.


All of above objects makes very good contribution to memory.

How can we reduce memory footprint
If is difficult to reduce memory footprint of CHM and possible reason that i can think of are

  • It has to support old interface of Map 
  • hash code collision technique used by java map are closed, closed hashing is based on creating Linklist on collision, closed hashing is very fast for resolution, but it is not CPU cache friendly, especially if nodes turns in big link list. There is interesting article about LinkList problem 


So we need alternate CHM implementation which is memory efficient.

Version 2 of CHM

I started to create version of CHM which has low memory foot print, target is come a close as array. I also used alternate hash code collision techniques to check the performance, their many options for Open_addressing.
I tried below options.

  • Linear probing - performance was not that great, although this is most cpu cache friendly. Need to spend some more time to get it right.
  • Double_hashing - Performance was in acceptable range.

Lets measure CHM V2 

  • Memory footprint
















There is big gain in terms of memory, CHM takes around 45%+ more than raw data , new implementation LCHM is very much close to Array type.

  • Single thread PUT performance

CHM out performs in PUT test, new implementation is around 50 to 80 Millisecond slower for 1 Million items. 50 to 80 ms is not noticeable delay and i think it is good for application where latency requirement is  in seconds and in case if latency requirement is in millisecond/nanosecond then any way CHM will not be good choice.
Reason for slow performance of LCHM is hash collision technique, double hashing is used for resolving has code collision. 

  • Concurrent Add performance 
















New implementation perform slightly better when multiple threads are used to write to map.
  • Get Performance

 Performance of get is slightly slower as compared to CHM.

Conclusion

New implementation out performs in memory test and it is bit slow in in get/put test, there are couple of things that can be done to improve performance of get/put and all the performance difference that we see is due to probing technique used.
  •  Probing technique can be improved, linear probing technique can used to to get cache friendly access. 
  • It is very easy to make probing technique parallel.