Skip to content

Conversation

@rjurney
Copy link
Collaborator

@rjurney rjurney commented Nov 9, 2024

This PR updates the Quickstart documentation to use the latest version. Every long journey starts with one step :)

@rjurney rjurney requested review from WeichenXu123 and rxin November 9, 2024 01:09
@rxin
Copy link
Contributor

rxin commented Nov 9, 2024

Does it actually work with the latest version?

@rjurney
Copy link
Collaborator Author

rjurney commented Nov 9, 2024

Does it actually work with the latest version?

In my local tests, yes. With pyspark. Let me do something more careful and systematic.

@rjurney
Copy link
Collaborator Author

rjurney commented Nov 11, 2024

@rxin more thorough testing:

Spark-Shell, v0.8.3, Scala 2.12

spark-shell --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12
24/11/10 16:36:50 WARN Utils: Your hostname, heracles resolves to a loopback address: 127.0.0.1; using 10.1.10.3 instead (on interface eno1)
24/11/10 16:36:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/rjurney/.ivy2/cache
The jars for the packages stored in: /home/rjurney/.ivy2/jars
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-4f2a2f0b-18d4-4795-b1c9-05da23ed8686;1.0
	confs: [default]
	found graphframes#graphframes;0.8.3-spark3.5-s_2.12 in spark-packages
	found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 47ms :: artifacts dl 2ms
	:: modules in use:
	graphframes#graphframes;0.8.3-spark3.5-s_2.12 from spark-packages in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-4f2a2f0b-18d4-4795-b1c9-05da23ed8686
	confs: [default]
	0 artifacts copied, 2 already retrieved (0kB/1ms)
24/11/10 16:36:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://heracles.hsd1.wa.comcast.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1731285412373).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.3
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 17.0.8.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.graphframes.GraphFrame
import org.graphframes.GraphFrame

scala> // Vertex DataFrame

scala> val v = spark.createDataFrame(List(
     |   ("a", "Alice", 34),
     |   ("b", "Bob", 36),
     |   ("c", "Charlie", 30),
     |   ("d", "David", 29),
     |   ("e", "Esther", 32),
     |   ("f", "Fanny", 36),
     |   ("g", "Gabby", 60)
     | )).toDF("id", "name", "age")
v: org.apache.spark.sql.DataFrame = [id: string, name: string ... 1 more field]

scala> // Edge DataFrame

scala> val e = spark.createDataFrame(List(
     |   ("a", "b", "friend"),
     |   ("b", "c", "follow"),
     |   ("c", "b", "follow"),
     |   ("f", "c", "follow"),
     |   ("e", "f", "follow"),
     |   ("e", "d", "friend"),
     |   ("d", "a", "friend"),
     |   ("a", "e", "friend")
     | )).toDF("src", "dst", "relationship")
e: org.apache.spark.sql.DataFrame = [src: string, dst: string ... 1 more field]

scala> // Create a GraphFrame

scala> val g = GraphFrame(v, e)
g: org.graphframes.GraphFrame = GraphFrame(v:[id: string, name: string ... 1 more field], e:[src: string, dst: string ... 1 more field])

scala> val results = g.pageRank.resetProbability(0.15).tol(0.01).run()
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_5 already exists on this machine; not re-adding it
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_0 already exists on this machine; not re-adding it
24/11/10 16:37:24 WARN BlockManager: Block rdd_78_1 already exists on this machine; not re-adding it
results: org.graphframes.GraphFrame = GraphFrame(v:[id: string, name: string ... 2 more fields], e:[src: string, dst: string ... 2 more fields])

scala> results.vertices.select("id", "pagerank").show()
+---+-------------------+
| id|           pagerank|
+---+-------------------+
|  b|  2.655507832863289|
|  e|0.37085233187676075|
|  a|0.44910633706538744|
|  f| 0.3283606792049851|
|  g| 0.1799821386239711|
|  d| 0.3283606792049851|
|  c| 2.6878300011606218|
+---+-------------------+

PySpark Shell, v0.8.3, Scala 2.12

pyspark --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12
Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.29.0 -- An enhanced Interactive Python. Type '?' for help.
24/11/10 16:34:57 WARN Utils: Your hostname, heracles resolves to a loopback address: 127.0.0.1; using 10.1.10.3 instead (on interface eno1)
24/11/10 16:34:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/rjurney/.ivy2/cache
The jars for the packages stored in: /home/rjurney/.ivy2/jars
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fd068865-5a0c-4c7f-bae2-f2d55ace54c3;1.0
	confs: [default]
	found graphframes#graphframes;0.8.3-spark3.5-s_2.12 in spark-packages
	found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 49ms :: artifacts dl 1ms
	:: modules in use:
	graphframes#graphframes;0.8.3-spark3.5-s_2.12 from spark-packages in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-fd068865-5a0c-4c7f-bae2-f2d55ace54c3
	confs: [default]
	0 artifacts copied, 2 already retrieved (0kB/2ms)
24/11/10 16:34:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.3
      /_/

Using Python version 3.10.11 (main, Apr 20 2023 19:02:41)
Spark context Web UI available at http://heracles.hsd1.wa.comcast.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1731285297847).
SparkSession available as 'spark'.

In [1]: from graphframes import GraphFrame

In [2]: # Vertex DataFrame
   ...: v = spark.createDataFrame([
   ...:   ("a", "Alice", 34),
   ...:   ("b", "Bob", 36),
   ...:   ("c", "Charlie", 30),
   ...:   ("d", "David", 29),
   ...:   ("e", "Esther", 32),
   ...:   ("f", "Fanny", 36),
   ...:   ("g", "Gabby", 60)
   ...: ], ["id", "name", "age"])
   ...: # Edge DataFrame
   ...: e = spark.createDataFrame([
   ...:   ("a", "b", "friend"),
   ...:   ("b", "c", "follow"),
   ...:   ("c", "b", "follow"),
   ...:   ("f", "c", "follow"),
   ...:   ("e", "f", "follow"),
   ...:   ("e", "d", "friend"),
   ...:   ("d", "a", "friend"),
   ...:   ("a", "e", "friend")
   ...: ], ["src", "dst", "relationship"])
   ...: # Create a GraphFrame
   ...: g = GraphFrame(v, e)

In [3]: results = g.pageRank(resetProbability=0.15, tol=0.01)

In [4]: results.vertices.select("id", "pagerank").show()
+---+-------------------+
| id|           pagerank|
+---+-------------------+
|  g| 0.1799821386239711|
|  f| 0.3283606792049851|
|  e|0.37085233187676075|
|  d| 0.3283606792049851|
|  c| 2.6878300011606218|
|  b|  2.655507832863289|
|  a|0.44910633706538744|
+---+-------------------+

@rxin rxin merged commit 3643518 into master Nov 11, 2024
6 checks passed
@rxin
Copy link
Contributor

rxin commented Nov 11, 2024

thanks. just merged this

Copy link
Contributor

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rjurney rjurney deleted the rjurney/update-quickstart-versions branch April 15, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants