You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sql-programming-guide.md
+92-4Lines changed: 92 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ a schema that describes the data types of each column in the row. A SchemaRDD i
20
20
in a traditional relational database. A SchemaRDD can be created from an existing RDD, parquet
21
21
file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
22
22
23
-
**All of the examples on this page use sample data included in the Spark distribution and can be run in the spark-shell.**
23
+
**All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`.**
24
24
25
25
</div>
26
26
@@ -33,6 +33,19 @@ a schema that describes the data types of each column in the row. A JavaSchemaR
33
33
in a traditional relational database. A JavaSchemaRDD can be created from an existing RDD, parquet
34
34
file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
35
35
</div>
36
+
37
+
<divdata-lang="python"markdown="1">
38
+
39
+
Spark SQL allows relational queries expressed in SQL or HiveQL to be executed using
40
+
Spark. At the core of this component is a new type of RDD,
41
+
[SchemaRDD](). SchemaRDDs are composed
42
+
[Row]() objects along with
43
+
a schema that describes the data types of each column in the row. A SchemaRDD is similar to a table
44
+
in a traditional relational database. A SchemaRDD can be created from an existing RDD, parquet
45
+
file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
46
+
47
+
**All of the examples on this page use sample data included in the Spark distribution and can be run in the `pyspark` shell.**
@@ -44,7 +57,7 @@ file, or by running HiveQL against data stored in [Apache Hive](http://hive.apac
44
57
45
58
The entry point into all relational functionality in Spark is the
46
59
[SQLContext](api/sql/core/index.html#org.apache.spark.sql.SQLContext) class, or one of its
47
-
decendents. To create a basic SQLContext, all you need is a SparkContext.
60
+
descendants. To create a basic SQLContext, all you need is a SparkContext.
48
61
49
62
{% highlight scala %}
50
63
val sc: SparkContext // An existing SparkContext.
@@ -60,7 +73,7 @@ import sqlContext._
60
73
61
74
The entry point into all relational functionality in Spark is the
62
75
[JavaSQLContext](api/sql/core/index.html#org.apache.spark.sql.api.java.JavaSQLContext) class, or one
63
-
of its decendents. To create a basic JavaSQLContext, all you need is a JavaSparkContext.
76
+
of its descendants. To create a basic JavaSQLContext, all you need is a JavaSparkContext.
64
77
65
78
{% highlight java %}
66
79
JavaSparkContext ctx = ...; // An existing JavaSparkContext.
@@ -69,6 +82,19 @@ JavaSQLContext sqlCtx = new org.apache.spark.sql.api.java.JavaSQLContext(ctx);
69
82
70
83
</div>
71
84
85
+
<divdata-lang="python"markdown="1">
86
+
87
+
The entry point into all relational functionality in Spark is the
88
+
[SQLContext]() class, or one
89
+
of its decedents. To create a basic SQLContext, all you need is a SparkContext.
90
+
91
+
{% highlight python %}
92
+
from pyspark.context import SQLContext
93
+
sqlCtx = SQLContext(sc)
94
+
{% endhighlight %}
95
+
96
+
</div>
97
+
72
98
</div>
73
99
74
100
## Running SQL on RDDs
@@ -81,7 +107,7 @@ One type of table that is supported by Spark SQL is an RDD of Scala case classes
81
107
defines the schema of the table. The names of the arguments to the case class are read using
82
108
reflection and become the names of the columns. Case classes can also be nested or contain complex
83
109
types such as Sequences or Arrays. This RDD can be implicitly converted to a SchemaRDD and then be
84
-
registered as a table. Tables can used in subsequent SQL statements.
110
+
registered as a table. Tables can be used in subsequent SQL statements.
85
111
86
112
{% highlight scala %}
87
113
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
0 commit comments