Update quick-start.md to use latest build of GraphFrames jar #461

rjurney · 2024-11-09T01:09:46Z

This PR updates the Quickstart documentation to use the latest version. Every long journey starts with one step :)

rxin · 2024-11-09T03:33:05Z

Does it actually work with the latest version?

rjurney · 2024-11-09T04:10:05Z

Does it actually work with the latest version?

In my local tests, yes. With pyspark. Let me do something more careful and systematic.

rjurney · 2024-11-11T00:39:02Z

@rxin more thorough testing:

Spark-Shell, v0.8.3, Scala 2.12

spark-shell --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12

24/11/10 16:36:50 WARN Utils: Your hostname, heracles resolves to a loopback address: 127.0.0.1; using 10.1.10.3 instead (on interface eno1)
24/11/10 16:36:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/rjurney/.ivy2/cache
The jars for the packages stored in: /home/rjurney/.ivy2/jars
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-4f2a2f0b-18d4-4795-b1c9-05da23ed8686;1.0
	confs: [default]
	found graphframes#graphframes;0.8.3-spark3.5-s_2.12 in spark-packages
	found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 47ms :: artifacts dl 2ms
	:: modules in use:
	graphframes#graphframes;0.8.3-spark3.5-s_2.12 from spark-packages in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-4f2a2f0b-18d4-4795-b1c9-05da23ed8686
	confs: [default]
	0 artifacts copied, 2 already retrieved (0kB/1ms)
24/11/10 16:36:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://heracles.hsd1.wa.comcast.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1731285412373).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.3
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 17.0.8.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.graphframes.GraphFrame
import org.graphframes.GraphFrame

scala> // Vertex DataFrame

scala> val v = spark.createDataFrame(List(
     |   ("a", "Alice", 34),
     |   ("b", "Bob", 36),
     |   ("c", "Charlie", 30),
     |   ("d", "David", 29),
     |   ("e", "Esther", 32),
     |   ("f", "Fanny", 36),
     |   ("g", "Gabby", 60)
     | )).toDF("id", "name", "age")
v: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]

scala> // Edge DataFrame

scala> val e = spark.createDataFrame(List(
     |   ("a", "b", "friend"),
     |   ("b", "c", "follow"),
     |   ("c", "b", "follow"),
     |   ("f", "c", "follow"),
     |   ("e", "f", "follow"),
     |   ("e", "d", "friend"),
     |   ("d", "a", "friend"),
     |   ("a", "e", "friend")
     | )).toDF("src", "dst", "relationship")
e: org.apache.spark.sql.DataFrame = [src: string, dst: string ... 1 more field]

scala> // Create a GraphFrame

scala> val g = GraphFrame(v, e)
g: org.graphframes.GraphFrame = GraphFrame(v:[id: string, name: string ... 1 more field], e:[src: string, dst: string ... 1 more field])

scala> val results = g.pageRank.resetProbability(0.15).tol(0.01).run()
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_5 already exists on this machine; not re-adding it
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_0 already exists on this machine; not re-adding it
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_1 already exists on this machine; not re-adding it
results: org.graphframes.GraphFrame = GraphFrame(v:[id: string, name: string ... 2 more fields], e:[src: string, dst: string ... 2 more fields])

scala> results.vertices.select("id", "pagerank").show()
+---+-------------------+
| id|           pagerank|
+---+-------------------+
|  b|  2.655507832863289|
|  e|0.37085233187676075|
|  a|0.44910633706538744|
|  f| 0.3283606792049851|
|  g| 0.1799821386239711|
|  d| 0.3283606792049851|
|  c| 2.6878300011606218|
+---+-------------------+

PySpark Shell, v0.8.3, Scala 2.12

pyspark --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12

Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.29.0 -- An enhanced Interactive Python. Type '?' for help.
24/11/10 16:34:57 WARN Utils: Your hostname, heracles resolves to a loopback address: 127.0.0.1; using 10.1.10.3 instead (on interface eno1)
24/11/10 16:34:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/rjurney/.ivy2/cache
The jars for the packages stored in: /home/rjurney/.ivy2/jars
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fd068865-5a0c-4c7f-bae2-f2d55ace54c3;1.0
	confs: [default]
	found graphframes#graphframes;0.8.3-spark3.5-s_2.12 in spark-packages
	found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 49ms :: artifacts dl 1ms
	:: modules in use:
	graphframes#graphframes;0.8.3-spark3.5-s_2.12 from spark-packages in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fd068865-5a0c-4c7f-bae2-f2d55ace54c3
	confs: [default]
	0 artifacts copied, 2 already retrieved (0kB/2ms)
24/11/10 16:34:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.3
      /_/

Using Python version 3.10.11 (main, Apr 20 2023 19:02:41)
Spark context Web UI available at http://heracles.hsd1.wa.comcast.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1731285297847).
SparkSession available as 'spark'.

In [1]: from graphframes import GraphFrame

In [2]: # Vertex DataFrame
   ...: v = spark.createDataFrame([
   ...:   ("a", "Alice", 34),
   ...:   ("b", "Bob", 36),
   ...:   ("c", "Charlie", 30),
   ...:   ("d", "David", 29),
   ...:   ("e", "Esther", 32),
   ...:   ("f", "Fanny", 36),
   ...:   ("g", "Gabby", 60)
   ...: ], ["id", "name", "age"])
   ...: # Edge DataFrame
   ...: e = spark.createDataFrame([
   ...:   ("a", "b", "friend"),
   ...:   ("b", "c", "follow"),
   ...:   ("c", "b", "follow"),
   ...:   ("f", "c", "follow"),
   ...:   ("e", "f", "follow"),
   ...:   ("e", "d", "friend"),
   ...:   ("d", "a", "friend"),
   ...:   ("a", "e", "friend")
   ...: ], ["src", "dst", "relationship"])
   ...: # Create a GraphFrame
   ...: g = GraphFrame(v, e)

In [3]: results = g.pageRank(resetProbability=0.15, tol=0.01)

In [4]: results.vertices.select("id", "pagerank").show()
+---+-------------------+
| id|           pagerank|
+---+-------------------+
|  g| 0.1799821386239711|
|  f| 0.3283606792049851|
|  e|0.37085233187676075|
|  d| 0.3283606792049851|
|  c| 2.6878300011606218|
|  b|  2.655507832863289|
|  a|0.44910633706538744|
+---+-------------------+

rxin · 2024-11-11T00:40:03Z

thanks. just merged this

WeichenXu123

LGTM!

Update quick-start.md to use latest build of GraphFrames jar

4099e98

rjurney requested review from WeichenXu123 and rxin November 9, 2024 01:09

rxin merged commit 3643518 into master Nov 11, 2024
6 checks passed

WeichenXu123 reviewed Nov 11, 2024

View reviewed changes

rjurney deleted the rjurney/update-quickstart-versions branch April 15, 2025 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update quick-start.md to use latest build of GraphFrames jar #461

Update quick-start.md to use latest build of GraphFrames jar #461

Uh oh!

rjurney commented Nov 9, 2024

Uh oh!

rxin commented Nov 9, 2024

Uh oh!

rjurney commented Nov 9, 2024

Uh oh!

rjurney commented Nov 11, 2024

Uh oh!

Uh oh!

rxin commented Nov 11, 2024

Uh oh!

WeichenXu123 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update quick-start.md to use latest build of GraphFrames jar #461

Update quick-start.md to use latest build of GraphFrames jar #461

Uh oh!

Conversation

rjurney commented Nov 9, 2024

Uh oh!

rxin commented Nov 9, 2024

Uh oh!

rjurney commented Nov 9, 2024

Uh oh!

rjurney commented Nov 11, 2024

Spark-Shell, v0.8.3, Scala 2.12

PySpark Shell, v0.8.3, Scala 2.12

Uh oh!

Uh oh!

rxin commented Nov 11, 2024

Uh oh!

WeichenXu123 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants