{"id":2024,"date":"2019-12-14T01:00:00","date_gmt":"2019-12-14T01:00:00","guid":{"rendered":"https:\/\/www.javaadvent.com\/?p=2024"},"modified":"2019-12-15T15:31:18","modified_gmt":"2019-12-15T15:31:18","slug":"popular-frameworks-for-big-data-processing-in-java","status":"publish","type":"post","link":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html","title":{"rendered":"Popular frameworks for big data processing in Java"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">The big data challenge<\/h1>\n\n\n\n<p>The concept of big data is understood differently in the\nvariety of domains where companies face the need to deal with increasing\nvolumes of data. In most of these scenarios the system under consideration needs\nto be designed in such a way so that it is capable of processing that data without\nsacrificing throughput as data grows in size. This essentially leads to the necessity\nof building systems that are highly scalable so that more resources can be\nallocated based on the volume of data that needs to be processed at a given\npoint in time. <\/p>\n\n\n\n<p>Building such a system is a time-consuming and complex activity and for that reason a third-party frameworks and libraries can be used to provide the scalability requirements out of the box. There are already a number of good choices that can be used in Java applications and this article we will discuss briefly some of the most popular ones:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"547\" height=\"301\" data-attachment-id=\"2025\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/big_data_processing_frameworks\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?fit=547%2C301&amp;ssl=1\" data-orig-size=\"547,301\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"big_data_processing_frameworks\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?fit=547%2C301&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?resize=547%2C301&#038;ssl=1\" alt=\"\" class=\"wp-image-2025\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?w=547&amp;ssl=1 547w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?resize=300%2C165&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/big_data_processing_frameworks.jpg?resize=508%2C280&amp;ssl=1 508w\" sizes=\"auto, (max-width: 547px) 100vw, 547px\" \/><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">The frameworks in action<\/h1>\n\n\n\n<p>We are going to demonstrate each of the frameworks by implementing\na simple pipeline for processing of data from devices that measure the air\nquality index for a given area. For simplicity we will assume that numeric data\nfrom the devices is either received in batches or in a streaming fashion. Throughout\nthe examples we are going to use the THRESHOLD constant to denote the value\nabove which we consider an area being polluted.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Spark<\/h2>\n\n\n\n<p>In Spark we need to first convert the data into a proper\nformat. We are going to use Datasets but we can also choose DataFrames or RDDs (Resilient\nDistributed Datasets) as an alternative for the data representation. We can\nthen apply a number of Spark transformations and actions in order to process\nthe data in a distributed fashion. <\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-preformatted\">public long countPollutedRegions(String[] numbers) {\n\t\t\/\/ runs a Spark master that takes up 4 cores\n\t\tSparkSession session = SparkSession.builder().\n\t\t\t\tappName(\"AirQuality\").\n\t\t\t\tmaster(\"local[4]\").\n\t\t\t\tgetOrCreate();\n\t\t\/\/ converts the array of numbers to a Spark dataset\n\t\tDataset numbersSet = session.createDataset(Arrays.asList(numbers), \n\t\t\t\tEncoders.STRING());\n\t\t\n\t\t\/\/ runs the data pipeline on the local spark\n\t\tlong pollutedRegions = numbersSet.map(number -&gt; Integer.valueOf(number), \n\t\t\t\tEncoders.INT())\n\t\t\t\t.filter(number -&gt; number &gt; THRESHOLD).count();\n\t\t\n\t\t\n\t\treturn pollutedRegions;\n\t}<\/pre>\n<\/div><\/div>\n\n\n\n<p>If we want to change the above application to read data from\nan external source, write to an external data source and run it on a Spark\ncluster rather than a local Spark instance we would have the following execution\nflow:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"330\" data-attachment-id=\"2029\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/spark\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?fit=864%2C475&amp;ssl=1\" data-orig-size=\"864,475\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"spark\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?fit=600%2C330&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?resize=600%2C330&#038;ssl=1\" alt=\"\" class=\"wp-image-2029\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?w=864&amp;ssl=1 864w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?resize=300%2C165&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?resize=768%2C422&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/spark.png?resize=508%2C279&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<p>The Spark driver might be either a separate instance or part\nof the Spark cluster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Flink<\/h2>\n\n\n\n<p>Similarly to Spark we need to represent the data in a Flink\nDataSet and then apply the necessary transformations and actions over it:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">public long countPollutedRegions(String[] numbers) throws Exception {\n\t\t\/\/ creates a Flink execution environment with proper configuration\n\t\tStreamExecutionEnvironment env = StreamExecutionEnvironment.\n\t\t\t\tcreateLocalEnvironment();\n\n \t\/\/ converts the array of numbers to a Flink dataset and creates\n \t\/\/ the data pipiline\n\t\tDataStream stream = env.fromCollection(Arrays.asList(numbers)).\n\t\t\t\tmap(number -&gt; Integer.valueOf(number))\n\t\t\t\t.filter(number -&gt; number &gt; THRESHOLD).returns(Integer.class);\n\t\tlong pollutedRegions = 0;\n\t\tIterator numbersIterator = DataStreamUtils.collect(stream);\n\t\twhile(numbersIterator.hasNext()) {\n\t\t\tpollutedRegions++;\n\t\t\tnumbersIterator.next();\n\t\t}\n\t\treturn pollutedRegions;\n\t}\n<\/pre>\n\n\n\n<p>If we want to change the above application to read data from\nan external source, write to an external data source and run it on a Flink\ncluster we would have the following execution flow:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"294\" data-attachment-id=\"2034\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/flink\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?fit=739%2C362&amp;ssl=1\" data-orig-size=\"739,362\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"flink\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?fit=600%2C294&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?resize=600%2C294&#038;ssl=1\" alt=\"\" class=\"wp-image-2034\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?w=739&amp;ssl=1 739w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?resize=300%2C147&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/flink.png?resize=508%2C249&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<p>The Flink client where the application is submitted to the\nFlink cluster is either the Flink CLI utility or JobManager\u2019s UI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Storm<\/h2>\n\n\n\n<p>In Storm the data pipeline is created as a topology of\nSpouts (the sources of data) and Bolts (the data processing units). Since Storm\ntypically processes unbounded streams of data we will emulate the processing of\nan array of air quality index numbers as bounded stream:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\tpublic void countPollutedRegions(String[] numbers) throws Exception {\n\n\t\t\/\/ builds the topology as a combination of spouts and bolts\n\t\tTopologyBuilder builder = new TopologyBuilder();\n\t\tbuilder.setSpout(\"numbers-spout\", new StormAirQualitySpout(numbers));\n\t\tbuilder.setBolt(\"number-bolt\", new StormAirQualityBolt()).\n\t\t\tshuffleGrouping(\"numbers-spout\");\n\t\t\n\t\t\/\/ prepares Storm conf and along with the topology submits it for \n\t\t\/\/ execution to a local Storm cluster\n\t\tConfig conf = new Config();\n\t\tconf.setDebug(true);\n\t\tLocalCluster localCluster = null;\n\t\ttry {\n\t\t\tlocalCluster = new LocalCluster();\n\t\t\tlocalCluster.submitTopology(\"airquality-topology\", \n\t\t\t\t\tconf, builder.createTopology());\n\t\t\tThread.sleep(10000);\n\t\t\tlocalCluster.shutdown();\n\t\t} catch (InterruptedException ex) {\n\t\t\tlocalCluster.shutdown();\n\t\t}\n\t}\n<\/pre>\n\n\n\n<p>We have one spout that provides a data source for the array\nof air quality index numbers and one bolt that filters only the ones that indicate\npolluted areas: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">public class StormAirQualitySpout extends BaseRichSpout {\n\n\tprivate boolean emitted = false;\n\n\tprivate SpoutOutputCollector collector;\n\n\tprivate String[] numbers;\n\n\tpublic StormAirQualitySpout(String[] numbers) {\n\t\tthis.numbers = numbers;\n\t}\n\t\n\t@Override\n\tpublic void declareOutputFields(OutputFieldsDeclarer declarer) {\n\t\tdeclarer.declare(new Fields(\"number\"));\n\t}\n\n\t@Override\n\tpublic void open(Map paramas, \n\t\t\tTopologyContext context, \n\t\t\tSpoutOutputCollector collector) {\n\t\tthis.collector = collector;\n\t}\n\n\t@Override\n\tpublic void nextTuple() {\n\t\t\/\/ we make sure that the numbers array is processed just once by \n\t\t\/\/ the spout\n\t\tif(!emitted) {\n\t\t\tfor(String number : numbers) {\n\t\t\t\tcollector.emit(new Values(number));\n\t\t\t}\n\t\t\temitted = true;\n\t\t}\n\t}\n}\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">public class StormAirQualityBolt extends BaseRichBolt {\n\n\tprivate static final int THRESHOLD = 10;\n\n\tprivate int pollutedRegions = 0;\n\n\t@Override\n\tpublic void declareOutputFields(OutputFieldsDeclarer declarer) {\n\t\tdeclarer.declare(new Fields(\"number\"));\n\t}\n\n\t@Override\n\tpublic void prepare(Map params, \nTopologyContext context, \nOutputCollector collector) {\n\t}\n\n\t@Override\n\tpublic void execute(Tuple tuple) {\n\t\tString number = tuple.getStringByField(\"number\");\n\t\tInteger numberInt = Integer.valueOf(number);\n\t\tif (numberInt &gt; THRESHOLD) {\n\t\t\tpollutedRegions++;\n\t\t}\n\t}\n}\n<\/pre>\n\n\n\n<p>We are using a LocalCluster instance for submitting to a\nlocal Storm cluster which is convenient for development purposes but we want to\nsubmit the Storm topology to a production cluster. In that case we would have\nthe following execution flow:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"254\" data-attachment-id=\"2035\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/storm\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?fit=771%2C327&amp;ssl=1\" data-orig-size=\"771,327\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"storm\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?fit=600%2C254&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?resize=600%2C254&#038;ssl=1\" alt=\"\" class=\"wp-image-2035\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?w=771&amp;ssl=1 771w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?resize=300%2C127&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?resize=768%2C326&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/storm.png?resize=508%2C215&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Ignite<\/h2>\n\n\n\n<p>In Ignite we need first to put the data in the distributed\ncache before running the data processing pipeline which is the former of an SQL\nquery executed in a distributed fashion over the Ignite cluster:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\tpublic long countPollutedRegions(String[] numbers) {\n\n\t\tIgniteConfiguration igniteConfig = new IgniteConfiguration();\n\t\tCacheConfiguration cacheConfig = \n\t\t\t\tnew CacheConfiguration();\n\t\t\/\/ cache key is number index in the array and value is the number\n \tcacheConfig.setIndexedTypes(Integer.class, String.class);\n\n\t\tcacheConfig.setName(NUMBERS_CACHE);\n\t\tigniteConfig.setCacheConfiguration(cacheConfig);\n\t\t\n\t\ttry (Ignite ignite = Ignition.start(igniteConfig)) {\n\t\t\tIgniteCache cache = ignite.getOrCreateCache(NUMBERS_CACHE);\n\t\t\t\/\/ adds the numbers to the Ignite cache\n\t\t\ttry (IgniteDataStreamer streamer = \n\t\t\t\t\tignite.dataStreamer(cache.getName())) {\n\t\t\t\tint key = 0;\n\t\t\t\tfor (String number : numbers) {\n\t\t\t\t\tstreamer.addData(key++, number);\n\t\t\t\t}\n\t\t\t}\n\n\t\t\t\/\/ performs an SQL query over the cached numbers\n\t\t\tSqlFieldsQuery query = new SqlFieldsQuery(\"select * from String where _val &gt; \" + THRESHOLD);\n\t\t\t\n\t\t\tFieldsQueryCursor&lt;List&gt; cursor = cache.query(query);\n\n\t\t\tint pollutedRegions = cursor.getAll().size();\n\t\t\treturn pollutedRegions;\n\t\t}\n}\n<\/pre>\n\n\n\n<p>If we want to run the application in an Ignite cluster it\nwill have the following execution flow:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"411\" data-attachment-id=\"2036\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/ignite\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?fit=883%2C605&amp;ssl=1\" data-orig-size=\"883,605\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ignite\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?fit=600%2C411&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?resize=600%2C411&#038;ssl=1\" alt=\"\" class=\"wp-image-2036\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?w=883&amp;ssl=1 883w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?resize=300%2C206&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?resize=768%2C526&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/ignite.png?resize=508%2C348&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Hazelcast Jet<\/h2>\n\n\n\n<p>Hazelcast Jet works on top of Hazelcast IMDG and similarly\nto Ignite if we want to process data we need first to put it in the Hazelcast IMDG\ncluster:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">public long countPollutedRegions(String[] numbers) {\n\n\t\t\/\/ prepares the Jet data processing pipeline\n\t\tPipeline p = Pipeline.create();\n\t\tp.drawFrom(Sources.list(\"numbers\")).\n\t\t\tmap(number -&gt; Integer.valueOf((String) number))\n\t\t\t.filter(number -&gt; number &gt; THRESHOLD).drainTo(Sinks.list(\"filteredNumbers\"));\n\n\t\tJetInstance jet = Jet.newJetInstance();\n\t\tIList numbersList = jet.getList(\"numbers\");\n\t\tnumbersList.addAll(Arrays.asList(numbers));\n\n\t\ttry {\n\t\t\t\/\/ submits the pipeline in the Jet cluster\n\t\t\tjet.newJob(p).join();\n\n\t\t\t\/\/ gets the filtered data from Hazelcast IMDG\n\t\t\tList filteredRecordsList = jet.getList(\"filteredNumbers\");\n\t\t\tint pollutedRegions = filteredRecordsList.size();\n\n\t\t\treturn pollutedRegions;\n\t\t} finally {\n\t\t\tJet.shutdownAll();\n\t\t}\n\t}\n<\/pre>\n\n\n\n<p>Note however that Jet also provides integration without of external data sources and data does not need to be stored in the IMDG cluster. You can also do the aggregation without first storing the data into a list (review the full example in Github that contains the improved version). Thanks to Jaromir and Can from Hazelcast engineering team for the valuable input. <\/p>\n\n\n\n<p>If we want to run the application in a Hazelcast Jet cluster it will have the following execution flow:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"423\" data-attachment-id=\"2037\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/hazelcast\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?fit=859%2C605&amp;ssl=1\" data-orig-size=\"859,605\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"hazelcast\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?fit=600%2C423&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?resize=600%2C423&#038;ssl=1\" alt=\"\" class=\"wp-image-2037\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?w=859&amp;ssl=1 859w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?resize=300%2C211&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?resize=768%2C541&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/hazelcast.png?resize=508%2C358&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Kafka Streams<\/h2>\n\n\n\n<p>Kafka Streams is a client library that uses Kafka topics as sources and sinks for the data processing pipeline. To make use of the Kafka Streams library for our scenario we would be putting the air quality index numbers in a <strong>numbers<\/strong> Kafka topic:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">public long countPollutedRegions() {\n\n\t\tList result = new LinkedList();\n\t\/\/ key\/value pairs contain string items\n \t\tfinal Serde stringSerde = Serdes.String();\n\n\t\t\/\/ prepares and runs the data processing pipeline\n\t\tfinal StreamsBuilder builder = new StreamsBuilder();\t\t\n\t\tbuilder.stream(\"numbers\", Consumed.with(stringSerde, stringSerde))\n\t\t\t\t.map((key, value) -&gt; new KeyValue(key, Integer.valueOf(value))).\n\t\t\t\t\tfilter((key, value) -&gt; value &gt; THRESHOLD)\n\t\t\t\t.foreach((key, value) -&gt; {\n\t\t\t\t\tresult.add(value.toString());\n\t\t\t\t});\n\t\n\t\tfinal Topology topology = builder.build();\n\t\tfinal KafkaStreams streams = new KafkaStreams(topology, \n\t\t\t\tcreateKafkaStreamsConfiguration());\n\t\tstreams.start();\n\n\t\ttry {\n\t\t\tThread.sleep(10000);\n\t\t} catch (InterruptedException e) {\n\t\t\te.printStackTrace();\n\t\t}\n\t\tint pollutedRegions = result.size();\n\t\tSystem.out.println(\"Number of severely polluted regions: \" + pollutedRegions);\n\t\tstreams.close();\n\t\treturn pollutedRegions;\n\t}\n\n\tprivate Properties createKafkaStreamsConfiguration() {\n\t\tProperties props = new Properties();\n\t\tprops.put(StreamsConfig.APPLICATION_ID_CONFIG, \"text-search-config\");\n\t\tprops.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, \"localhost:9092\");\n\t\tprops.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());\n\t\tprops.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());\n\n\t\treturn props;\n\t}\n<\/pre>\n\n\n\n<p>We will have the following execution flow for our Kafka Stream\napplication instances:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"376\" data-attachment-id=\"2038\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/kafka_streams\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?fit=634%2C397&amp;ssl=1\" data-orig-size=\"634,397\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"kafka_streams\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?fit=600%2C376&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?resize=600%2C376&#038;ssl=1\" alt=\"\" class=\"wp-image-2038\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?w=634&amp;ssl=1 634w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/kafka_streams.png?resize=508%2C318&amp;ssl=1 508w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Pulsar Functions <\/h2>\n\n\n\n<p>Apache Pulsar Functions are lightweight compute processes that work in a serverless fashion along with an Apache Pulsar cluster. Assuming we are streaming our air quality index in a Pulsar cluster we can write a function to count the number of indexes that exceed the given threshold and write the result back to Pulsar as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">public class PulsarFunctionsAirQualityApplication \n\timplements Function {\n\n\tprivate static final int HIGH_THRESHOLD = 10;\n\n    @Override\n    public Void process(String input, Context context) throws Exception {\n    \t\n    \tint number = Integer.valueOf(input);\n    \t\n    \tif(number &gt; HIGH_THRESHOLD) {\n            context.incrCounter(\"pollutedRegions\", 1);\n    \t}\n        return null;\n    }\n}<\/pre>\n\n\n\n<p>The execution flow of the function along with a Pulsar\ncluster is the following:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"557\" height=\"397\" data-attachment-id=\"2039\" data-permalink=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\/pulsar_functions\" data-orig-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?fit=557%2C397&amp;ssl=1\" data-orig-size=\"557,397\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"pulsar_functions\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?fit=557%2C397&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?resize=557%2C397&#038;ssl=1\" alt=\"\" class=\"wp-image-2039\" srcset=\"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?w=557&amp;ssl=1 557w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?resize=300%2C214&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2019\/12\/pulsar_functions.png?resize=508%2C362&amp;ssl=1 508w\" sizes=\"auto, (max-width: 557px) 100vw, 557px\" \/><\/figure>\n\n\n\n<p>The Pulsar function can run either in the Pulsar cluster or\nas a separate application.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Summary<\/h1>\n\n\n\n<p>In this article we reviewed briefly some of the most popular frameworks that can be used to implement big data processing systems in Java. Each of the presented frameworks is fairly big and deserves a separate article on its own. Although quite simple our air quality index data pipeline demonstrates the way these frameworks operate and you can use that as a basis for expanding your knowledge in each one of them that might be of further interest. You can review the complete code samples <a href=\"https:\/\/github.com\/martinfmi\/bigdataframeworks\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The big data challenge The concept of big data is understood differently in the variety of domains where companies face the need to deal with increasing volumes of data. In most of these scenarios the system under consideration needs to be designed in such a way so that it is capable of processing that data [&hellip;]<\/p>\n","protected":false},"author":50,"featured_media":1034,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[381],"tags":[469,472,476,477,473,474,475,478,479,470,471],"coauthors":[354],"class_list":["post-2024","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-christmas-2019-is-coming","tag-big-data","tag-flink","tag-hazelcast","tag-hazelcast-jet","tag-ignite","tag-kafka","tag-kafka-streams","tag-pulsar","tag-puslar-functions","tag-spark","tag-storm"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Popular frameworks for big data processing in Java - JVM Advent<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Popular frameworks for big data processing in Java - JVM Advent\" \/>\n<meta property=\"og:description\" content=\"The big data challenge The concept of big data is understood differently in the variety of domains where companies face the need to deal with increasing volumes of data. In most of these scenarios the system under consideration needs to be designed in such a way so that it is capable of processing that data [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html\" \/>\n<meta property=\"og:site_name\" content=\"JVM Advent\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Java-Advent-Calendar-229536173843473\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-12-14T01:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-12-15T15:31:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png\" \/>\n\t<meta property=\"og:image:width\" content=\"280\" \/>\n\t<meta property=\"og:image:height\" content=\"280\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Martin Toshev\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@javaadvent\" \/>\n<meta name=\"twitter:site\" content=\"@javaadvent\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Martin Toshev\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html\"},\"author\":{\"name\":\"Martin Toshev\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/#\\\/schema\\\/person\\\/eb708044dd802fabc1b5c9d4e9ac2959\"},\"headline\":\"Popular frameworks for big data processing in Java\",\"datePublished\":\"2019-12-14T01:00:00+00:00\",\"dateModified\":\"2019-12-15T15:31:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html\"},\"wordCount\":962,\"commentCount\":2,\"image\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/www.javaadvent.com\\\/content\\\/uploads\\\/2017\\\/12\\\/duke14.png?fit=280%2C280&ssl=1\",\"keywords\":[\"big data\",\"flink\",\"hazelcast\",\"hazelcast jet\",\"ignite\",\"kafka\",\"kafka streams\",\"pulsar\",\"puslar functions\",\"spark\",\"storm\"],\"articleSection\":[\"2019\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html\",\"url\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html\",\"name\":\"Popular frameworks for big data processing in Java - JVM Advent\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/www.javaadvent.com\\\/content\\\/uploads\\\/2017\\\/12\\\/duke14.png?fit=280%2C280&ssl=1\",\"datePublished\":\"2019-12-14T01:00:00+00:00\",\"dateModified\":\"2019-12-15T15:31:18+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/#\\\/schema\\\/person\\\/eb708044dd802fabc1b5c9d4e9ac2959\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/www.javaadvent.com\\\/content\\\/uploads\\\/2017\\\/12\\\/duke14.png?fit=280%2C280&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/www.javaadvent.com\\\/content\\\/uploads\\\/2017\\\/12\\\/duke14.png?fit=280%2C280&ssl=1\",\"width\":280,\"height\":280},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/2019\\\/12\\\/popular-frameworks-for-big-data-processing-in-java.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javaadvent.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Popular frameworks for big data processing in Java\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javaadvent.com\\\/\",\"name\":\"JVM Advent\",\"description\":\"The JVM Programming Advent Calendar\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javaadvent.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javaadvent.com\\\/#\\\/schema\\\/person\\\/eb708044dd802fabc1b5c9d4e9ac2959\",\"name\":\"Martin Toshev\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g3e6ba52e2c53d134111bf2113ed8a2fb\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g\",\"caption\":\"Martin Toshev\"},\"url\":\"https:\\\/\\\/www.javaadvent.com\\\/author\\\/martivtoshev\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Popular frameworks for big data processing in Java - JVM Advent","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html","og_locale":"en_US","og_type":"article","og_title":"Popular frameworks for big data processing in Java - JVM Advent","og_description":"The big data challenge The concept of big data is understood differently in the variety of domains where companies face the need to deal with increasing volumes of data. In most of these scenarios the system under consideration needs to be designed in such a way so that it is capable of processing that data [&hellip;]","og_url":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html","og_site_name":"JVM Advent","article_publisher":"https:\/\/www.facebook.com\/Java-Advent-Calendar-229536173843473\/","article_published_time":"2019-12-14T01:00:00+00:00","article_modified_time":"2019-12-15T15:31:18+00:00","og_image":[{"width":280,"height":280,"url":"https:\/\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png","type":"image\/png"}],"author":"Martin Toshev","twitter_card":"summary_large_image","twitter_creator":"@javaadvent","twitter_site":"@javaadvent","twitter_misc":{"Written by":"Martin Toshev","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#article","isPartOf":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html"},"author":{"name":"Martin Toshev","@id":"https:\/\/www.javaadvent.com\/#\/schema\/person\/eb708044dd802fabc1b5c9d4e9ac2959"},"headline":"Popular frameworks for big data processing in Java","datePublished":"2019-12-14T01:00:00+00:00","dateModified":"2019-12-15T15:31:18+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html"},"wordCount":962,"commentCount":2,"image":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png?fit=280%2C280&ssl=1","keywords":["big data","flink","hazelcast","hazelcast jet","ignite","kafka","kafka streams","pulsar","puslar functions","spark","storm"],"articleSection":["2019"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html","url":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html","name":"Popular frameworks for big data processing in Java - JVM Advent","isPartOf":{"@id":"https:\/\/www.javaadvent.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage"},"image":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png?fit=280%2C280&ssl=1","datePublished":"2019-12-14T01:00:00+00:00","dateModified":"2019-12-15T15:31:18+00:00","author":{"@id":"https:\/\/www.javaadvent.com\/#\/schema\/person\/eb708044dd802fabc1b5c9d4e9ac2959"},"breadcrumb":{"@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#primaryimage","url":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png?fit=280%2C280&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png?fit=280%2C280&ssl=1","width":280,"height":280},{"@type":"BreadcrumbList","@id":"https:\/\/www.javaadvent.com\/2019\/12\/popular-frameworks-for-big-data-processing-in-java.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javaadvent.com\/"},{"@type":"ListItem","position":2,"name":"Popular frameworks for big data processing in Java"}]},{"@type":"WebSite","@id":"https:\/\/www.javaadvent.com\/#website","url":"https:\/\/www.javaadvent.com\/","name":"JVM Advent","description":"The JVM Programming Advent Calendar","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javaadvent.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.javaadvent.com\/#\/schema\/person\/eb708044dd802fabc1b5c9d4e9ac2959","name":"Martin Toshev","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g3e6ba52e2c53d134111bf2113ed8a2fb","url":"https:\/\/secure.gravatar.com\/avatar\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a235abf761a167024d17889a980406ccd19f83785b09ee66cf15b0bf90784899?s=96&d=retro&r=g","caption":"Martin Toshev"},"url":"https:\/\/www.javaadvent.com\/author\/martivtoshev"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2017\/12\/duke14.png?fit=280%2C280&ssl=1","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":39,"url":"https:\/\/www.javaadvent.com\/2013\/12\/big-data-the-reactive-way.html","url_meta":{"origin":2024,"position":0},"title":"Big Data the &#8216;reactive&#8217; way","author":"gpanther","date":"December 21, 2013","format":false,"excerpt":"A metatrend going on in the IT industry is a shift from query-based, batch oriented systems to (soft) realtime updated systems. While this is associated with financial trading only, there are many other examples such as \"Just-In-Time\"-logistic systems, flight companies doing realtime pricing of passenger seats based on demand and\u2026","rel":"","context":"In &quot;2013&quot;","block_context":{"text":"2013","link":"https:\/\/www.javaadvent.com\/category\/2013"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/4.bp.blogspot.com\/-CXMJKza-lPo\/UptEsV6buvI\/AAAAAAAAAHI\/NS7uyMVDHfI\/s640\/Real%2BLive.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/4.bp.blogspot.com\/-CXMJKza-lPo\/UptEsV6buvI\/AAAAAAAAAHI\/NS7uyMVDHfI\/s640\/Real%2BLive.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/4.bp.blogspot.com\/-CXMJKza-lPo\/UptEsV6buvI\/AAAAAAAAAHI\/NS7uyMVDHfI\/s640\/Real%2BLive.png?resize=525%2C300 1.5x"},"classes":[]},{"id":3090,"url":"https:\/\/www.javaadvent.com\/2021\/12\/different-approaches-to-building-stateful-microservices-in-the-cloud-native-world.html","url_meta":{"origin":2024,"position":1},"title":"Different Approaches to building Stateful Microservices in the Cloud Native World","author":"Mary Grygleski","date":"December 23, 2021","format":false,"excerpt":"Cloud Native computing is all about working with stateless data and serverless systems. But we all live in a stateful world, in which data flows through systems inter-connected with one another through complex networks. So how can systems be able to manage and track the flow of data in a\u2026","rel":"","context":"In &quot;2021&quot;","block_context":{"text":"2021","link":"https:\/\/www.javaadvent.com\/category\/2021"},"img":{"alt_text":"valley near snowy mountain during daytime","src":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/11\/pexels-photo-164170.jpeg?fit=1200%2C510&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/11\/pexels-photo-164170.jpeg?fit=1200%2C510&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/11\/pexels-photo-164170.jpeg?fit=1200%2C510&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/11\/pexels-photo-164170.jpeg?fit=1200%2C510&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/11\/pexels-photo-164170.jpeg?fit=1200%2C510&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":758,"url":"https:\/\/www.javaadvent.com\/2015\/12\/java-in-2015-the-language-platform-ecosystem-and-community-continue-to-dominate.html","url_meta":{"origin":2024,"position":2},"title":"Java in 2015 &#8211; Major happenings","author":"Martijn Verburg","date":"December 24, 2015","format":false,"excerpt":"2015 was the year where Java the language, platform, ecosystem and community continue to dominate the software landscape, with only Javascript having a similar sized impact on the industry. In case you missed the highlights of 2015, here's some of the major happenings that occurred. Java 20 years old and\u2026","rel":"","context":"In &quot;2015&quot;","block_context":{"text":"2015","link":"https:\/\/www.javaadvent.com\/category\/2015"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5722,"url":"https:\/\/www.javaadvent.com\/2024\/12\/wasm-chicory-1.html","url_meta":{"origin":2024,"position":3},"title":"The Chicory Photo Album: Celebrating 1.0.0 and a Year of Wasm","author":"Andrea Peruffo","date":"December 25, 2024","format":false,"excerpt":"Intro Christmas is a time of tradition, and I\u2019m delighted to continue the one we started last year. On this very same date and blog, we unveiled the development of Chicory: Chicory: WebAssembly on the JVM. WebAssembly continues to grow steadily and strongly, much like we\u2019ve come to expect from\u2026","rel":"","context":"In &quot;2024&quot;","block_context":{"text":"2024","link":"https:\/\/www.javaadvent.com\/category\/2024"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2022\/12\/Feature-Image-Day-25.webp?fit=800%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2022\/12\/Feature-Image-Day-25.webp?fit=800%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2022\/12\/Feature-Image-Day-25.webp?fit=800%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2022\/12\/Feature-Image-Day-25.webp?fit=800%2C800&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1610,"url":"https:\/\/www.javaadvent.com\/2018\/12\/two-years-in-the-life-of-ai-ml-dl-and-java.html","url_meta":{"origin":2024,"position":4},"title":"Two years in the life of AI, ML, DL and Java","author":"Mani Sarkar","date":"December 10, 2018","format":false,"excerpt":"Citation All the images in the post are owned by the respective owners\/creators\/authors. Introduction AI, ML and DL are acronyms for Artificial Intelligence, Machine Learning and Deep Learning. Now back to what I was going to write about. If you ask me, I\u2019ll already admit that I have NOT even\u2026","rel":"","context":"In &quot;2018&quot;","block_context":{"text":"2018","link":"https:\/\/www.javaadvent.com\/category\/2018"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2018\/12\/Duke_Java_mascot_waving.svg-128x231.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":5457,"url":"https:\/\/www.javaadvent.com\/2024\/12\/jvm-in-the-age-of-ai-a-birds-eye-view-for-the-mechanical-sympathizers.html","url_meta":{"origin":2024,"position":5},"title":"JVM in the Age of AI: A Bird&#8217;s-Eye View for the Mechanical Sympathizers","author":"Artur Skowronski","date":"December 10, 2024","format":false,"excerpt":"Alright, hold on tight, because in today's edition of the advent calendar, we're going to talk about AI! Cause you know, we have 2024 and we do not need any other reason. Let's start by discussing what this article will cover. Each time the topic of AI arises, everyone tends\u2026","rel":"","context":"In &quot;2017&quot;","block_context":{"text":"2017","link":"https:\/\/www.javaadvent.com\/category\/2017"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/12\/Feature-Image-Day-10.png?fit=800%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/12\/Feature-Image-Day-10.png?fit=800%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/12\/Feature-Image-Day-10.png?fit=800%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.javaadvent.com\/content\/uploads\/2021\/12\/Feature-Image-Day-10.png?fit=800%2C800&ssl=1&resize=700%2C400 2x"},"classes":[]}],"jetpack_likes_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/posts\/2024","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/users\/50"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/comments?post=2024"}],"version-history":[{"count":13,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/posts\/2024\/revisions"}],"predecessor-version":[{"id":2223,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/posts\/2024\/revisions\/2223"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/media\/1034"}],"wp:attachment":[{"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/media?parent=2024"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/categories?post=2024"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/tags?post=2024"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.javaadvent.com\/wp-json\/wp\/v2\/coauthors?post=2024"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}