{"id":56224,"date":"2016-05-12T19:00:15","date_gmt":"2016-05-12T16:00:15","guid":{"rendered":"https:\/\/www.javacodegeeks.com\/?p=56224"},"modified":"2016-05-12T14:18:41","modified_gmt":"2016-05-12T11:18:41","slug":"integrate-custom-data-sources-apache-spark","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html","title":{"rendered":"How to Integrate Custom Data Sources Into Apache Spark"},"content":{"rendered":"<p>Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I&#8217;ll show you how to integrate custom data sources into Spark.<\/p>\n<p>Spark Streaming gives us the ability to stream from a variety of sources while using the same concise API for accessing data streams, performing SQL queries, or creating machine learning algorithms. These abilities make Spark a preferable framework for streaming (or any type of workflow) applications, since we can use all aspects of the framework.<\/p>\n<p>The challenge is figuring out how to integrate custom data sources into Spark so we can leverage its power without needing to change to more standard sources. It might seem logical to change, but in some cases it is just not possible or convenient to do so.<\/p>\n<h2>Streaming Custom Receivers<\/h2>\n<p>Spark offers different extension points, as we could see when we extended the Data Source API here in order to integrate our custom data store into Spark SQL.<\/p>\n<p>In this example, we are going to do the same, but we are also going to extend the streaming API so we can stream from <strong>anywhere<\/strong>.<\/p>\n<p>In order to implement our custom receiver, we need to extend the Receiver[A] class. Note that it has type annotation, so we can enforce type safety on our DStream from the streaming client side point of view.<\/p>\n<p>We are going to use this custom receiver to stream orders that one of our applications send over a socket.<\/p>\n<p>The structure of the data traveling through the network looks like this:<\/p>\n<pre class=\"brush:java\">1 5\r\n1 1 2\r\n2 1 1\r\n2 1 1\r\n4 1 1\r\n2 2\r\n1 2 2<\/pre>\n<p>We first receive the order ID and the total amount of the order, and then we receive the line items of the order. The first value is the item ID, the second is the order ID, (which matches the order ID value) and then the cost of the item. In this example, we have two orders. The first one has four items and the second one has only one item.<\/p>\n<p>The idea is to hide all of this from our Spark application, so what it receives on the DStream is a complete order defined on a stream as follows:<\/p>\n<pre class=\"brush:java\">val orderStream: DStream[Order] = .....<\/pre>\n<pre class=\"brush:java\">val orderStream: DStream[Order] = .....<\/pre>\n<p>At the same time, we are also using the receiver to stream our custom streaming source. Even though it sends the data over a socket, it will be quite complicated to use the standard socket stream from Spark, since we will not be able to control how the data is coming in and we will have the problem of conforming orders on the application itself. This could be very complicated, since once we are in the app space we are running in parallel, and it is hard to sync all of this incoming data. However, in the receiver space it is easy to create orders from the raw input text.<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p>Let\u2019s take a look at what our initial implementation looks like.<\/p>\n<pre class=\"brush:java\">case class Order(id: Int, total: Int, items: List[Item] = null)\r\ncase class Item(id: Int, cost: Int)\r\n\r\nclass OrderReceiver(host: String, port: Int) extends Receiver[Order](StorageLevel.MEMORY_ONLY)  {\r\n\r\n  override def onStart(): Unit = {\r\n\r\n    println(\"starting...\")\r\n\r\n    val thread = new Thread(\"Receiver\") {\r\n      override def run() {receive() }\r\n    }\r\n\r\n    thread.start()\r\n  }\r\n\r\n  override def onStop(): Unit = stop(\"I am done\")\r\n\r\n  def receive() = ....\r\n}<\/pre>\n<pre class=\"brush:java\">case class Order(id: Int, total: Int, items: List[Item] = null)\r\ncase class Item(id: Int, cost: Int)\r\n\r\nclass OrderReceiver(host: String, port: Int) extends Receiver[Order](StorageLevel.MEMORY_ONLY)  {\r\n\r\n  override def onStart(): Unit = {\r\n\r\n    println(\"starting...\")\r\n\r\n    val thread = new Thread(\"Receiver\") {\r\n      override def run() {receive() }\r\n    }\r\n\r\n    thread.start()\r\n  }\r\n\r\n  override def onStop(): Unit = stop(\"I am done\")\r\n\r\n  def receive() = ....\r\n}<\/pre>\n<p>Our OrderReceiver extends Receiver[Order] which allows us to store an Order (type annotated) inside Spark. We also need to implement the onStart() and onStop() methods. Note that onStart() creates a thread so it is non-blocking, which is very important for proper behavior.<\/p>\n<p>Now, let\u2019s take a look at the receive method, where the magic really happens.<\/p>\n<pre class=\"brush:java\">def receive() = {\r\n    val socket = new Socket(host, port)\r\n    var currentOrder: Order = null\r\n    var currentItems: List[Item] = null\r\n\r\n    val reader = new BufferedReader(new InputStreamReader (socket.getInputStream(), \"UTF-8\"))\r\n\r\n    while (!isStopped()) {\r\n      var userInput = reader.readLine()\r\n\r\n      if (userInput == null) stop(\"Stream has ended\")\r\n      else {\r\n        val parts = userInput.split(\" \")\r\n\r\n        if (parts.length == 2) {\r\n          if (currentOrder != null) {\r\n            store(Order(currentOrder.id, currentOrder.total, currentItems))\r\n          }\r\n\r\n          currentOrder = Order(parts(0).toInt, parts(1).toInt)\r\n          currentItems = List[Item]()\r\n        }\r\n        else {\r\n          currentItems = Item(parts(0).toInt, parts(1).toInt) :: currentItems\r\n        }\r\n      }\r\n    }\r\n  }<\/pre>\n<pre class=\"brush:java\">def receive() = {\r\n    val socket = new Socket(host, port)\r\n    var currentOrder: Order = null\r\n    var currentItems: List[Item] = null\r\n\r\n    val reader = new BufferedReader(new InputStreamReader (socket.getInputStream(), \"UTF-8\"))\r\n\r\n    while (!isStopped()) {\r\n      var userInput = reader.readLine()\r\n\r\n      if (userInput == null) stop(\"Stream has ended\")\r\n      else {\r\n        val parts = userInput.split(\" \")\r\n\r\n        if (parts.length == 2) {\r\n          if (currentOrder != null) {\r\n            store(Order(currentOrder.id, currentOrder.total, currentItems))\r\n          }\r\n\r\n          currentOrder = Order(parts(0).toInt, parts(1).toInt)\r\n          currentItems = List[Item]()\r\n        }\r\n        else {\r\n          currentItems = Item(parts(0).toInt, parts(1).toInt) :: currentItems\r\n        }\r\n      }\r\n    }\r\n  }<\/pre>\n<p>Here, we create a socket and point it to our source and then we just simply start reading from it until a stop command has been dispatched, or our socket has no more data on it. Note that we are reading the same structure we have defined previously (how our data is being sent). Once we have completely read an Order, we call store(\u2026) so it gets saved into Spark.<\/p>\n<p>There is nothing left to do here but to use our receiver in our application, which look like this:<\/p>\n<pre class=\"brush:java\">val config = new SparkConf().setAppName(\"streaming\")\r\nval sc = new SparkContext(config)\r\nval ssc = new StreamingContext(sc, Seconds(5))\r\n \r\nval stream: DStream[Order] = ssc.receiverStream(new OrderReceiver(port))<\/pre>\n<pre class=\"brush:java\">val config = new SparkConf().setAppName(\"streaming\")\r\nval sc = new SparkContext(config)\r\nval ssc = new StreamingContext(sc, Seconds(5))\r\n \r\nval stream: DStream[Order] = ssc.receiverStream(new OrderReceiver(port))<\/pre>\n<p>Note how we have created the stream using our custom OrderReceiver (the val stream has been annotated only for clarity but it is not required). From now on, we use the stream (DString[Order]) as any other stream we have used in any other application.<\/p>\n<pre class=\" brush:java\">stream.foreachRDD { rdd =&gt;\r\n      rdd.foreach(order =&gt; {\r\n            println(order.id))              \r\n            order.items.foreach(println)\r\n      }\r\n    }<\/pre>\n<pre class=\"brush:java\">stream.foreachRDD { rdd =&gt;\r\n      rdd.foreach(order =&gt; {\r\n            println(order.id))              \r\n            order.items.foreach(println)\r\n      }\r\n    }<\/pre>\n<h2>Summary<\/h2>\n<p>Spark Streaming comes in very handy when processing sources that generate endless data. You can use the same API that you use for Spark SQL and other components in the system, but it is also flexible enough to be extended to meet your particular needs.<\/p>\n<div class=\"attribution\">\n<table>\n<tbody>\n<tr>\n<td><span class=\"reference\">Reference: <\/span><\/td>\n<td><a href=\"https:\/\/www.mapr.com\/blog\/how-integrate-custom-data-sources-apache-spark\">How to Integrate Custom Data Sources Into Apache Spark<\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/join-us\/jcg\/\">JCG partner<\/a>\u00a0Nicolas A Perez at the <a href=\"http:\/\/www.mapr.com\/blog\">Mapr<\/a> blog.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I&#8217;ll show you how to integrate custom data sources into Spark. Spark Streaming gives us the ability to stream from a variety of sources while using the same concise API for accessing data streams, &hellip;<\/p>\n","protected":false},"author":1019,"featured_media":22307,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[1092],"class_list":["post-56224","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-enterprise-java","tag-apache-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I&#039;ll show you how to integrate\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I&#039;ll show you how to integrate\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2016-05-12T16:00:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Nicolas A Perez\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nicolas A Perez\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html\"},\"author\":{\"name\":\"Nicolas A Perez\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/4a346130b684c55cd19da4fb426e18c8\"},\"headline\":\"How to Integrate Custom Data Sources Into Apache Spark\",\"datePublished\":\"2016-05-12T16:00:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html\"},\"wordCount\":751,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"keywords\":[\"Apache Spark\"],\"articleSection\":[\"Enterprise Java\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html\",\"name\":\"How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"datePublished\":\"2016-05-12T16:00:15+00:00\",\"description\":\"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I'll show you how to integrate\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/05\\\/integrate-custom-data-sources-apache-spark.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\\\/enterprise-java\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"How to Integrate Custom Data Sources Into Apache Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/4a346130b684c55cd19da4fb426e18c8\",\"name\":\"Nicolas A Perez\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g\",\"caption\":\"Nicolas A Perez\"},\"description\":\"Nicolas is a software engineer at IPC, an independent SUBWAY\u00ae franchisee-owned and operated purchasing cooperative, where I work on their Big Data Platform. Very interested in Apache Spark, Hadoop, distributed systems, algorithms, and functional programming, especially in the Scala programming language.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/nicolas-a-perez\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks","description":"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I'll show you how to integrate","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html","og_locale":"en_US","og_type":"article","og_title":"How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks","og_description":"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I'll show you how to integrate","og_url":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2016-05-12T16:00:15+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","type":"image\/jpeg"}],"author":"Nicolas A Perez","twitter_card":"summary_large_image","twitter_creator":"@javacodegeeks","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Nicolas A Perez","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html"},"author":{"name":"Nicolas A Perez","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/4a346130b684c55cd19da4fb426e18c8"},"headline":"How to Integrate Custom Data Sources Into Apache Spark","datePublished":"2016-05-12T16:00:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html"},"wordCount":751,"commentCount":0,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","keywords":["Apache Spark"],"articleSection":["Enterprise Java"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html","url":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html","name":"How to Integrate Custom Data Sources Into Apache Spark - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","datePublished":"2016-05-12T16:00:15+00:00","description":"Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I'll show you how to integrate","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2016\/05\/integrate-custom-data-sources-apache-spark.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java","item":"https:\/\/www.javacodegeeks.com\/category\/java"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/www.javacodegeeks.com\/category\/java\/enterprise-java"},{"@type":"ListItem","position":4,"name":"How to Integrate Custom Data Sources Into Apache Spark"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/4a346130b684c55cd19da4fb426e18c8","name":"Nicolas A Perez","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a645e4c3c893b34dcc59441e590cf96a503fe645e13b13241d210faafd1e3c82?s=96&d=mm&r=g","caption":"Nicolas A Perez"},"description":"Nicolas is a software engineer at IPC, an independent SUBWAY\u00ae franchisee-owned and operated purchasing cooperative, where I work on their Big Data Platform. Very interested in Apache Spark, Hadoop, distributed systems, algorithms, and functional programming, especially in the Scala programming language.","url":"https:\/\/www.javacodegeeks.com\/author\/nicolas-a-perez"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/56224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/1019"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=56224"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/56224\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/22307"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=56224"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=56224"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=56224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}