{"id":53748,"date":"2016-03-09T19:00:43","date_gmt":"2016-03-09T17:00:43","guid":{"rendered":"https:\/\/www.javacodegeeks.com\/?p=53748"},"modified":"2016-03-09T14:50:55","modified_gmt":"2016-03-09T12:50:55","slug":"get-started-using-apache-spark-graphx-scala","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html","title":{"rendered":"How to Get Started Using Apache Spark GraphX with Scala"},"content":{"rendered":"<p><strong>Editor&#8217;s Note:<\/strong> Don&#8217;t miss our new free on-demand training course about <a href=\"https:\/\/www.mapr.com\/services\/mapr-academy\/create-data-pipeline-applications-using-apache-spark-on-demand\">how to create data pipeline applications using Apache Spark \u2013 learn more here.<\/a><\/p>\n<p>This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of mathematics called graph theory. It is a distributed graph processing framework that sits on top of the Spark core.<\/p>\n<h2>Overview of some graph concepts<\/h2>\n<p>A graph is a mathematical structure used to model relations between objects. A graph is made up of vertices and edges that connect them. The vertices are the objects and the edges are the relationships between them.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image00_edge-vertex-relationship.png\" rel=\"attachment wp-att-53759\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-53759\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image00_edge-vertex-relationship.png\" alt=\"image00_edge-vertex-relationship\" width=\"326\" height=\"199\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image00_edge-vertex-relationship.png 326w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image00_edge-vertex-relationship-300x183.png 300w\" sizes=\"(max-width: 326px) 100vw, 326px\" \/><\/a><\/p>\n<p>A <strong>directed graph<\/strong> is a graph where the edges have a direction associated with them. An example of a directed graph is a Twitter follower. User Bob can follow user Carol without implying that user Carol follows user Bob.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image02_bob-follows-carol.png\" rel=\"attachment wp-att-53760\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-53760\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image02_bob-follows-carol.png\" alt=\"image02_bob-follows-carol\" width=\"303\" height=\"179\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image02_bob-follows-carol.png 303w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image02_bob-follows-carol-300x177.png 300w\" sizes=\"(max-width: 303px) 100vw, 303px\" \/><\/a><\/p>\n<p>A  <strong>regular graph<\/strong> is a graph where each vertex has the same number of edges. An example of a regular graph is Facebook friends. If Bob is a friend of Carol, then Carol is also a friend of Bob.<\/p>\n<h4>GraphX Property Graph<\/h4>\n<p>GraphX extends the Spark RDD with a Resilient Distributed Property Graph.<\/p>\n<p>The <a href=\"http:\/\/spark.apache.org\/docs\/latest\/api\/scala\/index.html#org.apache.spark.graphx.Graph\">property graph<\/a> is a directed multigraph which can have multiple edges in parallel. Every edge and vertex has user defined properties associated with it. The parallel edges allow multiple relationships between the same vertices.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image01_flight-relationship.png\" rel=\"attachment wp-att-53761\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-53761\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image01_flight-relationship.png\" alt=\"image01_flight-relationship\" width=\"562\" height=\"313\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image01_flight-relationship.png 562w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image01_flight-relationship-300x167.png 300w\" sizes=\"(max-width: 562px) 100vw, 562px\" \/><\/a><\/p>\n<p>In this activity, you will use GraphX to analyze flight data.<\/p>\n<h2>Scenario<\/h2>\n<p>As a starting simple example, we will analyze three flights. For each flight, we have the following information:<\/p>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>Originating Airport<\/strong><\/td>\n<td><strong>Destination Airport<\/strong><\/td>\n<td><strong>Distance<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SFO<\/td>\n<td>ORD<\/td>\n<td>1800 miles<\/td>\n<\/tr>\n<tr>\n<td>ORD<\/td>\n<td>DFW&gt;<\/td>\n<td>800 miles<\/td>\n<\/tr>\n<tr>\n<td>DFW<\/td>\n<td>SFO&gt;<\/td>\n<td>1400 miles<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>In this scenario, we are going to represent the airports as vertices and routes as edges. For our graph we will have three vertices, each representing an airport. The distance between the airports is a route property, as shown below:<a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image04_3-vertex-relationship.png\" rel=\"attachment wp-att-53762\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-53762\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image04_3-vertex-relationship.png\" alt=\"image04_3-vertex-relationship\" width=\"526\" height=\"298\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image04_3-vertex-relationship.png 526w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image04_3-vertex-relationship-300x170.png 300w\" sizes=\"(max-width: 526px) 100vw, 526px\" \/><\/a><\/p>\n<h4><strong>Vertex Table for Airports<\/strong><\/h4>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>ID<\/strong><\/td>\n<td><strong>Property<\/strong><\/td>\n<\/tr>\n<\/thead>\n<thead><\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>SFO<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>ORD<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>DFW<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h4><strong>Edges Table for Routes<\/strong><\/h4>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>SrcId<\/strong><\/td>\n<td><strong>DestId<\/strong><\/td>\n<td><strong>Property<\/strong><\/td>\n<\/tr>\n<\/thead>\n<thead><\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>2<\/td>\n<td>1800<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>3<\/td>\n<td>800<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>1<\/td>\n<td>1400<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h4>Software<\/h4>\n<p>This tutorial will run on the MapR Sandbox, which includes Spark.<\/p>\n<ul>\n<li>You can download the code and data to run these examples from here:\n<ul>\n<li><a href=\"https:\/\/github.com\/caroljmcdonald\/sparkgraphxexample\">https:\/\/github.com\/caroljmcdonald\/sparkgraphxexample<\/a><\/li>\n<\/ul>\n<\/li>\n<li>The examples in this post can be run in the Spark shell, after launching with the spark-shell command.<\/li>\n<li>You can also run the code as a standalone application as described in the tutorial on <a href=\"https:\/\/www.mapr.com\/products\/mapr-sandbox-hadoop\/tutorials\/spark-tutorial\">Getting Started with Spark on MapR Sandbox<\/a>.<\/li>\n<\/ul>\n<h4>Launch the Spark Interactive Shell<\/h4>\n<p>Log into the MapR Sandbox, as explained in <a href=\"https:\/\/www.mapr.com\/products\/mapr-sandbox-hadoop\/tutorials\/spark-tutorial\">Getting Started with Spark on MapR Sandbox<\/a>, using userid user01, password mapr. Start the spark shell with:<\/p>\n<pre class=\" brush:java\">$ spark-shell<\/pre>\n<h4>Define Vertices<\/h4>\n<p>First we will import the GraphX packages.<\/p>\n<p>(In the code boxes, comments are in Green and output is in Blue)<\/p>\n<pre class=\" brush:java\">import org.apache.spark._\r\nimport org.apache.spark.rdd.RDD\r\n\/\/ import classes required for using GraphX\r\nimport org.apache.spark.graphx._<\/pre>\n<p>We define airports as vertices. Vertices have an Id and can have properties or attributes associated with them. Each vertex consists of :<\/p>\n<ul>\n<li>Vertex id \u2192 Id (Long)<\/li>\n<li>Vertex Property \u2192 name (String)<\/li>\n<\/ul>\n<h4>Vertex Table for Airports<\/h4>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>ID<\/strong><\/td>\n<td><strong>Property(V)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>SFO<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>We define an RDD with the above properties that is then used for the vertexes.<\/p>\n<pre class=\" brush:java\">\/\/ create vertices RDD with ID and Name\r\nval vertices=Array((1L, (\"SFO\")),(2L, (\"ORD\")),(3L,(\"DFW\")))\r\nval vRDD= sc.parallelize(vertices)\r\nvRDD.take(1)\r\n\/\/ Array((1,SFO)) \r\n\/\/ Defining a default vertex called nowhere\r\nval nowhere = \"nowhere\"<\/pre>\n<h4>Define Edges<\/h4>\n<p>Edges are the routes between airports. An edge must have a source, a destination, and can have properties. In our example, an edge consists of:<\/p>\n<ul>\n<li>Edge origin id \u2192 src (Long)<\/li>\n<li>Edge destination id \u2192 dest (Long)<\/li>\n<li>Edge Property distance \u2192 distance (Long)<\/li>\n<\/ul>\n<h4>Edges Table for Routes<\/h4>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>srcid<\/strong><\/td>\n<td><strong>destid<\/strong><\/td>\n<td><strong>Property(E)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>1<\/td>\n<td>12<\/td>\n<td>1800<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>We define an RDD with the above properties that is then used for the edges. The edge RDD has the form (src id, dest id, distance ).<\/p>\n<pre class=\" brush:java\">\/\/ create routes RDD with srcid, destid, distance\r\nval edges = Array(Edge(1L,2L,1800),Edge(2L,3L,800),Edge(3L,1L,1400))\r\nval eRDD= sc.parallelize(edges) \r\neRDD.take(2)\r\n\/\/ Array(Edge(1,2,1800), Edge(2,3,800))<\/pre>\n<h4>Create Property Graph<\/h4>\n<p>To create a graph, you need to have a Vertex RDD, Edge RDD, and a Default vertex.<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p>Create a property graph called graph.<\/p>\n<pre class=\" brush:java\">\/\/ define the graph\r\nval graph = Graph(vRDD,eRDD, nowhere)\r\n\/\/ graph vertices\r\ngraph.vertices.collect.foreach(println)\r\n\/\/ (2,ORD)\r\n\/\/ (1,SFO)\r\n\/\/ (3,DFW) \r\n\/\/ graph edges\r\ngraph.edges.collect.foreach(println) \r\n\/\/ Edge(1,2,1800)\r\n\/\/ Edge(2,3,800)\r\n\/\/ Edge(3,1,1400)<\/pre>\n<p>1. How many airports are there?<\/p>\n<pre class=\" brush:java\">\/\/ How many airports?\r\nval numairports = graph.numVertices\r\n\/\/ Long = 3<\/pre>\n<p>2. How many routes are there?<\/p>\n<pre class=\" brush:java\">\/\/ How many routes?\r\nval numroutes = graph.numEdges\r\n\/\/ Long = 3<\/pre>\n<p>3. which routes &gt; 1000 miles distance?<\/p>\n<pre class=\" brush:java\">\/\/ routes &gt; 1000 miles distance?\r\ngraph.edges.filter { case Edge(src, dst, prop) =&gt; prop &gt; 1000 }.collect.foreach(println)\r\n\/\/ Edge(1,2,1800)\r\n\/\/ Edge(3,1,1400)<\/pre>\n<p>4. The EdgeTriplet class extends the Edge class by adding the srcAttr and dstAttr members which contain the source and destination properties, respectively.<\/p>\n<pre class=\" brush:java\">\/\/ triplets\r\ngraph.triplets.take(3).foreach(println)\r\n((1,SFO),(2,ORD),1800)\r\n((2,ORD),(3,DFW),800)\r\n((3,DFW),(1,SFO),1400)<\/pre>\n<p>5. Sort and print out the longest distance routes<\/p>\n<pre class=\" brush:java\">\/\/ print out longest routes\r\ngraph.triplets.sortBy(_.attr, ascending=false).map(triplet =&gt;\r\n     \"Distance \" + triplet.attr.toString + \" from \" + triplet.srcAttr + \" to \" + triplet.dstAttr + \".\").collect.foreach(println) \r\nDistance 1800 from SFO to ORD.\r\nDistance 1400 from DFW to SFO.\r\nDistance 800 from ORD to DFW.<\/pre>\n<h2>Analyze Real Flight Data with GraphX<\/h2>\n<h2>Scenario<\/h2>\n<p>Our data is from <a href=\"http:\/\/www.transtats.bts.gov\/DL_SelectFields.asp?Table_ID=236&amp;DB_Short_Name=On-Time\">http:\/\/www.transtats.bts.gov\/DL_SelectFields.asp?Table_ID=236&amp;DB_Short_Name=On-Time<\/a>. We are using flight information for January 2015. For each flight, we have the following information:<\/p>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>Field<\/strong><\/td>\n<td><strong>Description<\/strong><\/td>\n<td><strong>Example Value<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>dOfM(String)<\/strong><\/td>\n<td>Day of month<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<td><strong>dOfW (String)<\/strong><\/td>\n<td>Day of week<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td><strong>carrier (String)<\/strong><\/td>\n<td>Carrier code<\/td>\n<td>AA<\/td>\n<\/tr>\n<tr>\n<td><strong>tailNum (String)<\/strong><\/td>\n<td>Unique identifier for the plane &#8211; tail number<\/td>\n<td>N787AA<\/td>\n<\/tr>\n<tr>\n<td><strong>flnum(Int)<\/strong><\/td>\n<td>Flight number<\/td>\n<td>21<\/td>\n<\/tr>\n<tr>\n<td><strong>org_id(String)<\/strong><\/td>\n<td>Origin airport ID<\/td>\n<td>12478<\/td>\n<\/tr>\n<tr>\n<td><strong>origin(String)<\/strong><\/td>\n<td>Origin Airport Code<\/td>\n<td>JFK<\/td>\n<\/tr>\n<tr>\n<td><strong>dest_id (String)<\/strong><\/td>\n<td>Destination airport ID<\/td>\n<td>12892<\/td>\n<\/tr>\n<tr>\n<td><strong>dest (String)<\/strong><\/td>\n<td>Destination airport code<\/td>\n<td>LAX<\/td>\n<\/tr>\n<tr>\n<td><strong>crsdeptime(Double)<\/strong><\/td>\n<td>Scheduled departure time<\/td>\n<td>900<\/td>\n<\/tr>\n<tr>\n<td><strong>deptime (Double)<\/strong><\/td>\n<td>Actual departure time<\/td>\n<td>855<\/td>\n<\/tr>\n<tr>\n<td><strong>depdelaymins (Double)<\/strong><\/td>\n<td>Departure delay in minutes<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<td><strong>crsarrtime (Double)<\/strong><\/td>\n<td>Scheduled arrival time<\/td>\n<td>1230<\/td>\n<\/tr>\n<tr>\n<td><strong>arrtime (Double)<\/strong><\/td>\n<td>Actual arrival time<\/td>\n<td>1237<\/td>\n<\/tr>\n<tr>\n<td><strong>arrdelaymins (Double)<\/strong><\/td>\n<td>Arrival delay minutes<\/td>\n<td>7<\/td>\n<\/tr>\n<tr>\n<td><strong>crselapsedtime (Double)<\/strong><\/td>\n<td>Elapsed time<\/td>\n<td>390<\/td>\n<\/tr>\n<tr>\n<td><strong>dist (Int)<\/strong><\/td>\n<td>Distance<\/td>\n<td>2475<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>In this scenario, we are going to represent the airports as vertices and routes as edges. We are interested in visualizing airports and routes and would like to see the number of airports that have departures or arrivals.<\/p>\n<ul>\n<li>You can download the code and data to run these examples from here: <a href=\"https:\/\/github.com\/caroljmcdonald\/sparkgraphxexample\">https:\/\/github.com\/caroljmcdonald\/sparkgraphxexample<\/a><\/li>\n<\/ul>\n<p>Log into the MapR Sandbox, as explained in <a href=\"https:\/\/www.mapr.com\/products\/mapr-sandbox-hadoop\/tutorials\/spark-tutorial\">Getting Started with Spark on MapR Sandbox<\/a>, using userid user01, password mapr. Copy the sample data file <strong>rita2014jan.csv<\/strong> to your sandbox home directory \/user\/user01 using scp.<\/p>\n<p>Start the Spark shell with:<\/p>\n<pre class=\" brush:java\">$ spark-shell<\/pre>\n<h4>Define Vertices<\/h4>\n<p>First we will import the GraphX packages.<\/p>\n<p>(In the code boxes, comments are in Green and output is in Blue)<\/p>\n<pre class=\" brush:java\">import org.apache.spark._\r\nimport org.apache.spark.rdd.RDD\r\nimport org.apache.spark.util.IntParam\r\n\/\/ import classes required for using GraphX\r\nimport org.apache.spark.graphx._\r\nimport org.apache.spark.graphx.util.GraphGenerators<\/pre>\n<p>Below we use Scala case classes to define the flight schema corresponding to the csv data file.<\/p>\n<pre class=\" brush:java\">\/\/ define the Flight Schema\r\ncase class Flight(dofM:String, dofW:String, carrier:String, tailnum:String, flnum:Int, org_id:Long, origin:String, dest_id:Long, dest:String, crsdeptime:Double, deptime:Double, depdelaymins:Double, crsarrtime:Double, arrtime:Double, arrdelay:Double,crselapsedtime:Double,dist:Int)<\/pre>\n<p>The function below parses a line from the data file into the flight class.<\/p>\n<pre class=\" brush:java\">\/\/ function to parse input into Flight class\r\ndef parseFlight(str: String): Flight = {\r\n val line = str.split(\",\")\r\n Flight(line(0), line(1), line(2), line(3), line(4).toInt, line(5).toLong, line(6), line(7).toLong, line(8), line(9).toDouble, line(10).toDouble, line(11).toDouble, line(12).toDouble, line(13).toDouble, line(14).toDouble, line(15).toDouble, line(16).toInt)\r\n}<\/pre>\n<p>Below we load the data from the csv file into a <a href=\"https:\/\/spark.apache.org\/docs\/0.8.1\/api\/core\/org\/apache\/spark\/rdd\/RDD.html\">Resilient Distributed Dataset (RDD)<\/a>. RDDs can have <a href=\"https:\/\/spark.apache.org\/docs\/1.3.0\/programming-guide.html#transformations\">transformations<\/a> and<a href=\"https:\/\/spark.apache.org\/docs\/1.3.0\/programming-guide.html#actions\"> actions<\/a>, the first() action returns the first element in the RDD.<\/p>\n<pre class=\" brush:java\">\/\/ load the data into a RDD\r\nval textRDD = sc.textFile(\"\/user\/user01\/data\/rita2014jan.csv\")\r\n\/\/ MapPartitionsRDD[1] at textFile \r\n\/\/ parse the RDD of csv lines into an RDD of flight classes\r\nval flightsRDD = textRDD.map(parseFlight).cache()<\/pre>\n<p>We define airports as vertices. Vertices can have properties or attributes associated with them. Each vertex has the following property:<\/p>\n<ul>\n<li>Airport name (String)<\/li>\n<\/ul>\n<p><strong>Vertex Table for Airports<\/strong><\/p>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>ID<\/strong><\/td>\n<td><strong>Property(V)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>10397<\/td>\n<td>ATL<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<br \/>\nWe define an RDD with the above properties that is then used for the vertexes.<\/p>\n<pre class=\" brush:java\">\/\/ create airports RDD with ID and Name\r\nval airports = flightsRDD.map(flight =&gt; (flight.org_id, flight.origin)).distinct \r\nairports.take(1)\r\n\/\/ Array((14057,PDX)) \r\n\/\/ Defining a default vertex called nowhere\r\nval nowhere = \"nowhere\" \r\n\/\/ Map airport ID to the 3-letter code to use for printlns\r\nval airportMap = airports.map { case ((org_id), name) =&gt; (org_id -&gt; name) }.collect.toList.toMap\r\n\/\/ Map(13024 -&gt; LMT, 10785 -&gt; BTV,\u2026)<\/pre>\n<h4>Define Edges<\/h4>\n<p>Edges are the routes between airports. An edge must have a source, a destination, and can have properties. In our example, an edge consists of:<\/p>\n<ul>\n<li>Edge origin id \u2192 src (Long)<\/li>\n<li>Edge destination id \u2192 dest (Long)<\/li>\n<li>Edge property distance \u2192 distance (Long)<\/li>\n<\/ul>\n<p><strong>Edges Table for Routes<\/strong><\/p>\n<table border=\"1\" width=\"\" cellspacing=\"0\" cellpadding=\"0\">\n<thead>\n<tr>\n<td><strong>srcid<\/strong><\/td>\n<td><strong>destid<\/strong><\/td>\n<td><strong>Property(E)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>14869<\/td>\n<td>14683<\/td>\n<td>1087<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>We define an RDD with the above properties that is then used for the edges. The edge RDD has the form (src id, dest id, distance).<\/p>\n<pre class=\" brush:java\">\/\/ create routes RDD with srcid, destid, distance\r\nval routes = flightsRDD.map(flight =&gt; ((flight.org_id, flight.dest_id), flight.dist)).distinctdistinct \r\nroutes.take(2)\r\n\/\/ Array(((14869,14683),1087), ((14683,14771),1482)) \r\n\/\/ create edges RDD with srcid, destid , distance\r\nval edges = routes.map {\r\n case ((org_id, dest_id), distance) =&gt;Edge(org_id.toLong, dest_id.toLong, distance) } \r\nedges.take(1)\r\n\/\/Array(Edge(10299,10926,160))<\/pre>\n<h4>Create Property Graph<\/h4>\n<p>To create a graph, you need to have a Vertex RDD, Edge RDD and a Default vertex.<\/p>\n<p>Create a property graph called graph.<\/p>\n<pre class=\" brush:java\">\/\/ define the graph\r\nval graph = Graph(airports, edges, nowhere) \r\n\/\/ graph vertices\r\ngraph.vertices.take(2)\r\nArray((10208,AGS), (10268,ALO)) \r\n\/\/ graph edges\r\ngraph.edges.take(2)\r\nArray(Edge(10135,10397,692), Edge(10135,13930,654))<\/pre>\n<p>6. How many airports are there?<\/p>\n<pre class=\" brush:java\">\/\/ How many airports?\r\nval numairports = graph.numVertices\r\n\/\/ Long = 301<\/pre>\n<p>7. How many routes are there?<\/p>\n<pre class=\" brush:java\">\/\/ How many airports?\r\nval numroutes = graph.numEdges\r\n\/\/ Long = 4090<\/pre>\n<p>8. Which routes &gt; 1000 miles distance?<\/p>\n<pre class=\" brush:java\">\/\/ routes &gt; 1000 miles distance?\r\ngraph.edges.filter { case ( Edge(org_id, dest_id,distance))=&gt; distance &gt; 1000}.take(3)\r\n\/\/ Array(Edge(10140,10397,1269), Edge(10140,10821,1670), Edge(10140,12264,1628))<\/pre>\n<p>9. The EdgeTriplet class extends the edge class by adding the srcAttr and dstAttr members which contain the source and destination properties, respectively.<\/p>\n<pre class=\" brush:java\">\/\/ triplets\r\ngraph.triplets.take(3).foreach(println)\r\n((10135,ABE),(10397,ATL),692)\r\n((10135,ABE),(13930,ORD),654)\r\n((10140,ABQ),(10397,ATL),1269)<\/pre>\n<p>10. Sort and print out the longest distance routes<\/p>\n<pre class=\" brush:java\">\/\/ print out longest routes\r\ngraph.triplets.sortBy(_.attr, ascending=false).map(triplet =&gt;\r\n     \"Distance \" + triplet.attr.toString + \" from \" + triplet.srcAttr + \" to \" + triplet.dstAttr + \".\").take(10).foreach(println) \r\nDistance 4983 from JFK to HNL.\r\nDistance 4983 from HNL to JFK.\r\nDistance 4963 from EWR to HNL.\r\nDistance 4963 from HNL to EWR.\r\nDistance 4817 from HNL to IAD.\r\nDistance 4817 from IAD to HNL.\r\nDistance 4502 from ATL to HNL.\r\nDistance 4502 from HNL to ATL.\r\nDistance 4243 from HNL to ORD.\r\nDistance 4243 from ORD to HNL.<\/pre>\n<p>11. Compute the highest degree vertex<\/p>\n<pre class=\" brush:java\">\/\/ Define a reduce operation to compute the highest degree vertex\r\ndef max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {\r\n if (a._2 &gt; b._2) a else b\r\n}\r\nval maxInDegree: (VertexId, Int) = graph.inDegrees.reduce(max)\r\n\/\/maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (10397,152) \r\nval maxOutDegree: (VertexId, Int) = graph.outDegrees.reduce(max)\r\n\/\/maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (10397,153) \r\nval maxDegrees: (VertexId, Int) = graph.degrees.reduce(max)\r\n\/\/maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (10397,305) \r\n\/\/ Get the name for the airport with id 10397\r\nairportMap(10397)\r\n\/\/res70: String = ATL<\/pre>\n<p>12. Which airport has the most incoming flights?<\/p>\n<pre class=\" brush:java\">\/\/ get top 3\r\nval maxIncoming = graph.inDegrees.collect.sortWith(_._2 &gt; _._2).map(x =&gt; (airportMap(x._1), x._2)).take(3) \r\nmaxIncoming.foreach(println)\r\n(ATL,152)\r\n(ORD,145)\r\n(DFW,143) \r\n\/\/ which airport has the most outgoing flights?\r\nval maxout= graph.outDegrees.join(airports).sortBy(_._2._1, ascending=false).take(3) \r\nmaxout.foreach(println)\r\n(10397,(153,ATL))\r\n(13930,(146,ORD))\r\n(11298,(143,DFW))<\/pre>\n<h2>PageRank<\/h2>\n<p>Another GraphX operator is PageRank. which is based on the Google PageRank algorithm.<\/p>\n<p>PageRank measures the importance of each vertex in a graph, by determining which vertexes have the most edges with other vertexes. In our example, we can use PageRank to determine which airports are the most important by measuring which airports have the most connections to other airports.<\/p>\n<p>We have to specify the tolerance, which is the measure of convergence.<\/p>\n<p>13. What are the most important airports according to PageRank?<\/p>\n<pre class=\" brush:java\">\/\/ use pageRank\r\nval ranks = graph.pageRank(0.1).vertices\r\n\/\/ join the ranks  with the map of airport id to name\r\nval temp= ranks.join(airports)\r\ntemp.take(1)\r\n\/\/ Array((15370,(0.5365013694244737,TUL))) \r\n\/\/ sort by ranking\r\nval temp2 = temp.sortBy(_._2._1, false)\r\ntemp2.take(2)\r\n\/\/Array((10397,(5.431032677813346,ATL)), (13930,(5.4148119418905765,ORD))) \r\n\/\/ get just the airport names\r\nval impAirports =temp2.map(_._2._2)\r\nimpAirports.take(4)\r\n\/\/res6: Array[String] = Array(ATL, ORD, DFW, DEN)<\/pre>\n<h2><strong>Pregel<\/strong><\/h2>\n<p>Many important graph algorithms are iterative algorithms, since properties of vertices depend on properties of their neighbors, which depend on properties of <i>their<\/i> neighbors. Pregel is an iterative graph processing model, developed at Google, which uses a sequence of iterations of messages passing between vertices in a graph. GraphX implements a Pregel-like bulk-synchronous message-passing API.<\/p>\n<p>With the Pregel implementation in GraphX, vertices can only send messages to neighboring vertices.<\/p>\n<p>The Pregel operator is executed in a series of super steps. In each super step:<\/p>\n<ul>\n<li>The vertices receive the sum of their inbound messages from the previous super step<\/li>\n<li>They compute a new value for the vertex property<\/li>\n<li>They send messages to the neighboring vertices in the next super step<\/li>\n<\/ul>\n<p>When there are no more messages remaining, the Pregel operator will end the iteration and the final graph is returned.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image03_nstep-relationship-messaging.png\" rel=\"attachment wp-att-53763\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-53763\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image03_nstep-relationship-messaging.png\" alt=\"image03_nstep-relationship-messaging\" width=\"800\" height=\"370\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image03_nstep-relationship-messaging.png 800w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image03_nstep-relationship-messaging-300x139.png 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2016\/03\/image03_nstep-relationship-messaging-768x355.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p>The code below computes the cheapest airfare using Pregel with the following formula to compute airfare.<\/p>\n<p><strong>50 + distance \/ 20 <\/strong><\/p>\n<pre class=\" brush:java\">\/\/ starting vertex\r\nval sourceId: VertexId = 13024\r\n\/\/ a graph with edges containing airfare cost calculation\r\nval gg = graph.mapEdges(e =&gt; 50.toDouble + e.attr.toDouble\/20 )\r\n\/\/ initialize graph, all vertices except source have distance infinity\r\nval initialGraph = gg.mapVertices((id, _) =&gt; if (id == sourceId) 0.0 else Double.PositiveInfinity)\r\n\/\/ call pregel on graph\r\nval sssp = initialGraph.pregel(Double.PositiveInfinity)(\r\n \/\/ Vertex Program\r\n (id, dist, newDist) =&gt; math.min(dist, newDist),\r\n triplet =&gt; {\r\n  \/\/ Send Message\r\n  if (triplet.srcAttr + triplet.attr &lt; triplet.dstAttr) {\r\n   Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))\r\n  } else {\r\n   Iterator.empty\r\n  }\r\n },\r\n \/\/ Merge Message\r\n (a,b) =&gt; math.min(a,b)\r\n) \r\n\/\/ routes , lowest flight cost\r\nprintln(sssp.edges.take(4).mkString(\"\\n\"))\r\nEdge(10135,10397,84.6)\r\nEdge(10135,13930,82.7)\r\nEdge(10140,10397,113.45)\r\nEdge(10140,10821,133.5) \r\n\/\/ routes with airport codes , lowest flight cost\r\nssp.edges.map{ case ( Edge(org_id, dest_id,price))=&gt; ( (airportMap(org_id), airportMap(dest_id), price)) }.takeOrdered(10)(Ordering.by(_._3))\r\nArray((WRG,PSG,51.55), (PSG,WRG,51.55), (CEC,ACV,52.8), (ACV,CEC,52.8), (ORD,MKE,53.35), (IMT,RHI,53.35), (MKE,ORD,53.35), (RHI,IMT,53.35), (STT,SJU,53.4), (SJU,STT,53.4)) \r\n\/\/ airports , lowest flight cost\r\nprintln(sssp.vertices.take(4).mkString(\"\\n\")) \r\n(10208,277.79)\r\n(10268,260.7)\r\n(14828,261.65)\r\n(14698,125.25) \r\n\/\/ airport codes , sorted lowest flight cost\r\nsssp.vertices.collect.map(x =&gt; (airportMap(x._1), x._2)).sortWith(_._2 &lt; _._2)\r\nres21: Array[(String, Double)] = Array(PDX,62.05), (SFO,65.75), (EUG,117.35)<\/pre>\n<h2>Want to learn more?<\/h2>\n<ul>\n<li><a href=\"http:\/\/spark.apache.org\/docs\/latest\/graphx-programming-guide.html\">GraphX Programming Guide<\/a><\/li>\n<li><a href=\"https:\/\/www.mapr.com\/company\/press-releases\/mapr-unveils-free-complete-apache-spark-training-and-developer-certification\">MapR announces Free Complete Apache Spark Training and Developer Certification<\/a><\/li>\n<li><a href=\"http:\/\/learn.mapr.com\/?q=spark#-l\">Free Spark On Demand Training<\/a><\/li>\n<li><a href=\"http:\/\/learn.mapr.com\/?q=spark#certification-1,-l\">Get Certified on Spark with MapR Spark Certification<\/a><\/li>\n<li><a href=\"https:\/\/www.mapr.com\/sites\/default\/files\/spark-certification-study-guide.pdf\">MapR Certified Spark Developer Study Guide<\/a><\/li>\n<\/ul>\n<p>In this blog post, you learned how to get started using Apache Spark GraphX with Scala on the MapR Sandbox. If you have any questions about GraphX, please ask them in the comments section below.<\/p>\n<div class=\"attribution\">\n<table>\n<tbody>\n<tr>\n<td><span class=\"reference\">Reference: <\/span><\/td>\n<td><a href=\"http:\/\/www.mapr.com\/blog\/how-get-started-using-apache-spark-graphx-scala\">How to Get Started Using Apache Spark GraphX with Scala<\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/join-us\/jcg\/\">JCG partner<\/a> Carol McDonald at the <a href=\"http:\/\/www.mapr.com\/blog\">Mapr<\/a> blog.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;\n<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Editor&#8217;s Note: Don&#8217;t miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of &hellip;<\/p>\n","protected":false},"author":976,"featured_media":22307,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[1092],"class_list":["post-53748","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scala","tag-apache-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"Editor&#039;s Note: Don&#039;t miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"Editor&#039;s Note: Don&#039;t miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2016-03-09T17:00:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Carol Mcdonald\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Carol Mcdonald\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html\"},\"author\":{\"name\":\"Carol Mcdonald\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/7fa13d8fe2a71a211cbb1fff3588a99d\"},\"headline\":\"How to Get Started Using Apache Spark GraphX with Scala\",\"datePublished\":\"2016-03-09T17:00:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html\"},\"wordCount\":1544,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"keywords\":[\"Apache Spark\"],\"articleSection\":[\"Scala\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html\",\"name\":\"How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"datePublished\":\"2016-03-09T17:00:43+00:00\",\"description\":\"Editor's Note: Don't miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2014\\\/03\\\/apache-spark-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2016\\\/03\\\/get-started-using-apache-spark-graphx-scala.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"JVM Languages\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/jvm-languages\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Scala\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/jvm-languages\\\/scala\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"How to Get Started Using Apache Spark GraphX with Scala\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/7fa13d8fe2a71a211cbb1fff3588a99d\",\"name\":\"Carol Mcdonald\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g\",\"caption\":\"Carol Mcdonald\"},\"sameAs\":[\"http:\\\/\\\/www.mapr.com\\\/blog\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/carol-mcdonald\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks","description":"Editor's Note: Don't miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html","og_locale":"en_US","og_type":"article","og_title":"How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks","og_description":"Editor's Note: Don't miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This","og_url":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2016-03-09T17:00:43+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","type":"image\/jpeg"}],"author":"Carol Mcdonald","twitter_card":"summary_large_image","twitter_creator":"@javacodegeeks","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Carol Mcdonald","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html"},"author":{"name":"Carol Mcdonald","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/7fa13d8fe2a71a211cbb1fff3588a99d"},"headline":"How to Get Started Using Apache Spark GraphX with Scala","datePublished":"2016-03-09T17:00:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html"},"wordCount":1544,"commentCount":0,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","keywords":["Apache Spark"],"articleSection":["Scala"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html","url":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html","name":"How to Get Started Using Apache Spark GraphX with Scala - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","datePublished":"2016-03-09T17:00:43+00:00","description":"Editor's Note: Don't miss our new free on-demand training course about how to create data pipeline applications using Apache Spark \u2013 learn more here. This","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2014\/03\/apache-spark-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2016\/03\/get-started-using-apache-spark-graphx-scala.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"JVM Languages","item":"https:\/\/www.javacodegeeks.com\/category\/jvm-languages"},{"@type":"ListItem","position":3,"name":"Scala","item":"https:\/\/www.javacodegeeks.com\/category\/jvm-languages\/scala"},{"@type":"ListItem","position":4,"name":"How to Get Started Using Apache Spark GraphX with Scala"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/7fa13d8fe2a71a211cbb1fff3588a99d","name":"Carol Mcdonald","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/19256c3e96390164e2fd1671d36bae49ec1ff3870662d46510c2f3b2caf95a16?s=96&d=mm&r=g","caption":"Carol Mcdonald"},"sameAs":["http:\/\/www.mapr.com\/blog"],"url":"https:\/\/www.javacodegeeks.com\/author\/carol-mcdonald"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/53748","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/976"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=53748"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/53748\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/22307"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=53748"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=53748"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=53748"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}