Skip to content

Commit 72884c8

Browse files
committed
Add docs for %python.sql feature
1 parent e931dc4 commit 72884c8

File tree

4 files changed

+58
-4
lines changed

4 files changed

+58
-4
lines changed

docs/interpreter/python.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ To access the help, type **help()**
4646
## Python modules
4747
The interpreter can use all modules already installed (with pip, easy_install...)
4848

49-
## Use Zeppelin Dynamic Forms
49+
## Using Zeppelin Dynamic Forms
5050
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your Python code.
5151

5252
**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.**
@@ -65,6 +65,7 @@ print (z.select("f1",[("o1","1"),("o2","2")],"2"))
6565
print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"])))
6666
```
6767

68+
6869
## Zeppelin features not fully supported by the Python Interpreter
6970

7071
* Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter.
@@ -94,7 +95,7 @@ z.show(plt, height='150px')
9495

9596

9697
## Pandas integration
97-
[Zeppelin Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides simple API to visualize data in Pandas DataFrames, same as in Matplotlib.
98+
Apace Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides build-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
9899

99100
Example:
100101

@@ -104,6 +105,34 @@ rates = pd.read_csv("bank.csv", sep=";")
104105
z.show(rates)
105106
```
106107

108+
## SQL over DataFrames
109+
110+
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query Pandas DataFrames and visualization of results though build-in [Table Dispaly System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
111+
112+
**Pre-requests**
113+
114+
- Pandas `pip install pandas`
115+
- PandaSQL `pip install -U pandasql`
116+
117+
In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e
118+
119+
- first paragraph
120+
121+
```python
122+
import pandas as pd
123+
rates = pd.read_csv("bank.csv", sep=";")
124+
```
125+
126+
- next paragraph
127+
128+
```sql
129+
%sql
130+
SELECT * FROM rates WHERE age < 40
131+
```
132+
133+
Otherwise it can be reffered as `%python.sql`
134+
135+
107136
## Technical description
108137

109138
For in-depth technical details on current implementation plese reffer [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md).

python/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ Current interpreter implementation spawns new system python process through `Pro
4040
* JavaBuilder can't send SIGINT signal to interrupt paragraph execution. Therefore interpreter directly send a `kill SIGINT PID` to python process to interrupt execution. Python process catch SIGINT signal with some code defined in bootstrap.py
4141

4242
* Matplotlib display feature is made with SVG export (in string) and then displays it with html code.
43+
44+
* `%python.sql` support for Pandas DataFrames is optional and provided using https://github.com/yhat/pandasql if user have one installed

python/src/main/java/org/apache/zeppelin/python/PythonPandasSqlInterpreter.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,8 @@ private PythonInterpreter getPythonInterpreter() {
6565
public void open() {
6666
LOG.info("Open Python SQL interpreter instance: {}", this.toString());
6767

68-
//TODO(bzz): check by importing and catching ImportError
69-
//if (pandasAndNumpyAndPandasqlAreInstalled) {
68+
//TODO(bzz): check i.e by importing and catching ImportError
69+
//if (py4jAndPandasAndPandasqlAreInstalled) {
7070
try {
7171
LOG.info("Bootstrap {} interpreter with {}", this.toString(), SQL_BOOTSTRAP_FILE_PY);
7272
PythonInterpreter python = getPythonInterpreter();

python/src/main/resources/bootstrap.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ def help():
7272
print ('''<pre>z.show(plt,width='50px')
7373
z.show(plt,height='150px') </pre></div>''')
7474
print ('<h3>Pandas DataFrame</h3>')
75+
print ('<div> You need to have Pandas module installed ')
76+
print ('to use this functionality (pip install pandas) !</div><br/>')
7577
print """
7678
<div>The interpreter can visualize Pandas DataFrame
7779
with the function z.show()
@@ -81,6 +83,27 @@ def help():
8183
z.show(df)
8284
</pre></div>
8385
"""
86+
print ('<h3>SQL over Pandas DataFrame</h3>')
87+
print ('<div> You need to have Pandas&Pandasql modules installed ')
88+
print ('to use this functionality (pip install pandas pandasql) !</div><br/>')
89+
print """
90+
<div>Python interpreter group includes %sql interpreter that can query
91+
Pandas DataFrames using SQL and visualize results using Zeppelin Table Display System
92+
93+
<pre>
94+
%python
95+
import pandas as pd
96+
df = pd.read_csv("bank.csv", sep=";")
97+
</pre>
98+
<br />
99+
100+
<pre>
101+
%python.sql
102+
%sql
103+
SELECT * from df LIMIT 5
104+
</pre></div>
105+
"""
106+
84107

85108
class PyZeppelinContext(object):
86109
""" If py4j is detected, these class will be override

0 commit comments

Comments
 (0)