Skip to content

Commit 57c264d

Browse files
babupe-gbzz
authored andcommitted
BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153]
### What is this PR for? Google BigQuery is a popular no-ops datawarehouse. This commit will enable Apache Zeppelin users to perform BI and Analytics on their datasets in BigQuery. ### What type of PR is it? Feature ### Todos * Make bigquery interpreter appear in the interpreters section in the UI * Build SQL completion * Authorization of non-gcp ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1153 ### How should this be tested? copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml Add org.apache.zeppelin.bigquery.bigQueryInterpreter to property zeppelin.interpreters in zeppelin-site.xml Start Zeppelin Add BigQuery Interpreter with your project ID Create new note with %bsql.sql and run your SQL against public datasets in bigquery. ### Screenshots (if appropriate) ![screenshot from 2016-07-12 14 27 30](https://cloud.githubusercontent.com/assets/4242273/16785302/31b104e2-4842-11e6-87c0-b79763dd85c0.png) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Babu Prasad Elumalai <[email protected]> Author: babupe <[email protected]> Author: Alexander Bezzubov <[email protected]> Closes #1170 from babupe/babupe-bigquery and squashes the following commits: ffed801 [Babu Prasad Elumalai] pushing BQ Exception to logs and Interpreter error output d3c2316 [babupe] Merge pull request #2 from bzz/babupe-add-auth-docs 64525b8 [Alexander Bezzubov] Fix typos in docs 03a777f [Alexander Bezzubov] add docs for BigQuery auth outside of GCE fcab6b7 [babupe] Merge pull request #1 from bzz/babupe-final 6a95333 [Alexander Bezzubov] Rename Apach2.0 license for google's code to adhere naming conventions 7d4f40b [Alexander Bezzubov] Add exidentaly removed licenses due to merge conflict 3be1912 [Babu Prasad Elumalai] New changes 41e076e [Babu Prasad Elumalai] Fixed formatting with readme file 97874a4 [Babu Prasad Elumalai] Pushing cropped screenshots 64affbb [babupe] Added cropped interpreter screenshot 4a1d29c [Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xml e520b7b [Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file 69cb724 [Babu Prasad Elumalai] Fixed license header and added manual unit test documentation bbf26cc [Babu Prasad Elumalai] Added path and specific wording 4a3153f [Babu Prasad Elumalai] removed bad package from import d0c8e01 [Babu Prasad Elumalai] Added technical description to bigquery.md b6d181c [Babu Prasad Elumalai] Trying to add screenshot in README 569757f [Babu Prasad Elumalai] Incorporated feedback 764385c [Babu Prasad Elumalai] Interpreter modification, License, doc changes d85abd2 [Babu Prasad Elumalai] Modified code and license 17f6d89 [Babu Prasad Elumalai] ZEPPELIN-1153 comments committed 8fa647b [Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin 22e3487 [babupe] Update LICENSE e88b017 [babupe] Created a new license file d90e10f [babupe] Removed BigQuery from notice aa52553 [Babu Prasad Elumalai] Merge branch 'master' of https://github.com/apache/zeppelin ae096d2 [Babu Prasad Elumalai] License changes 20962d2 [Babu Prasad Elumalai] Pushing license changes 3d5f8e7 [Babu Prasad Elumalai] Modified license header 5a2e674 [Babu Prasad Elumalai] Added license info for Jackson library and added BQ API source 4db74c1 [Babu Prasad Elumalai] Adding license stuff 31c373f [Babu Prasad Elumalai] Fixed formatting with readme file 287744c [Babu Prasad Elumalai] Merge branch 'babupe-bigquery' of https://github.com/babupe/zeppelin into babupe-bigquery f318b20 [Babu Prasad Elumalai] Pushing cropped screenshots 17fd4e8 [babupe] Added cropped interpreter screenshot f872aa0 [Babu Prasad Elumalai] Removed unnecessary dependencies in pom.xml 5983e36 [Babu Prasad Elumalai] Exclude constants.json file for rat plugin since its static config file 11e88dc [Babu Prasad Elumalai] Replaced license header with formatting 4b82abd [Babu Prasad Elumalai] Fixed license header and added manual unit test documentation 87f5efe [Babu Prasad Elumalai] Added path and specific wording 6132d78 [Babu Prasad Elumalai] Fixing License and skipping failing tests 2254a49 [Babu Prasad Elumalai] removed bad package from import 73e3f6d [Babu Prasad Elumalai] Added technical description to bigquery.md 089820b [Babu Prasad Elumalai] Trying to add screenshot in README a00b48e [Babu Prasad Elumalai] Incorporated feedback 17846f1 [Babu Prasad Elumalai] Interpreter modification, License, doc changes 50c41fc [Babu Prasad Elumalai] Modified code and license 75d8ee6 [Babu Prasad Elumalai] ZEPPELIN-1153 comments committed 2a2bedc [Babu Prasad Elumalai] BigQuery Interpreter for Apazhe Zeppelin
1 parent 6eb2cc5 commit 57c264d

File tree

14 files changed

+1082
-6
lines changed

14 files changed

+1082
-6
lines changed

LICENSE

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,7 @@ The following components are provided under the Apache License. See project link
251251
The text of each license is also included at licenses/LICENSE-[project]-[version].txt.
252252

253253
(Apache 2.0) Bootstrap v3.0.2 (http://getbootstrap.com/) - https://github.com/twbs/bootstrap/blob/v3.0.2/LICENSE
254+
(Apache 2.0) Software under ./bigquery/* was developed at Google (http://www.google.com/). Licensed under the Apache v2.0 License.
254255

255256
========================================================================
256257
BSD 3-Clause licenses
@@ -270,4 +271,4 @@ BSD 2-Clause licenses
270271
The following components are provided under the BSD 3-Clause license. See file headers and project links for details.
271272

272273
(BSD 2 Clause) portions of SQLLine (http://sqlline.sourceforge.net/) - http://sqlline.sourceforge.net/#license
273-
jdbc/src/main/java/org/apache/zeppelin/jdbc/SqlCompleter.java
274+
jdbc/src/main/java/org/apache/zeppelin/jdbc/SqlCompleter.java

NOTICE

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,4 @@ Copyright 2015 - 2016 The Apache Software Foundation
44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).
66

7-
87
Portions of this software were developed at NFLabs, Inc. (http://www.nflabs.com)

bigquery/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Overview
2+
BigQuery interpreter for Apache Zeppelin
3+
4+
# Pre requisities
5+
You can follow the instructions at [Apache Zeppelin on Dataproc](https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/apache-zeppelin/README.MD) to bring up Zeppelin on Google dataproc.
6+
You could also install and bring up Zeppelin on Google compute Engine.
7+
8+
# Unit Tests
9+
BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.
10+
11+
If you like to run these tests manually, please follow the following steps:
12+
* [Create a new project](https://support.google.com/cloud/answer/6251787?hl=en)
13+
* [Create a Google Compute Engine instance](https://cloud.google.com/compute/docs/instances/create-start-instance)
14+
* Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json`
15+
* Run the command mvn <options> -Dbigquery.text.exclude='' test -pl bigquery -am
16+
17+
18+
# Interpreter Configuration
19+
20+
Configure the following properties during Interpreter creation.
21+
22+
<table class="table-configuration">
23+
<tr>
24+
<th>Name</th>
25+
<th>Default Value</th>
26+
<th>Description</th>
27+
</tr>
28+
<tr>
29+
<td>zeppelin.bigquery.project_id</td>
30+
<td> </td>
31+
<td>Google Project Id</td>
32+
</tr>
33+
<tr>
34+
<td>zeppelin.bigquery.wait_time</td>
35+
<td>5000</td>
36+
<td>Query Timeout in Milliseconds</td>
37+
</tr>
38+
<tr>
39+
<td>zeppelin.bigquery.max_no_of_rows</td>
40+
<td>100000</td>
41+
<td>Max result set size</td>
42+
</tr>
43+
</table>
44+
45+
# Connection
46+
The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.
47+
48+
# Google BigQuery API Javadoc
49+
[API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/)
50+
[Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)
51+
52+
We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.
53+
54+
# Enabling the BigQuery Interpreter
55+
56+
In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**.
57+
58+
# Using the BigQuery Interpreter
59+
60+
In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery.
61+
You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL.
62+
63+
For Example, SQL to query for top 10 departure delays across airports using the flights public dataset
64+
65+
```bash
66+
%bigquery.sql
67+
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
68+
FROM [bigquery-samples:airline_ontime_data.flights]
69+
group by departure_airport
70+
order by 2 desc
71+
limit 10
72+
```
73+
74+
Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery
75+
76+
```bash
77+
%bigquery.sql
78+
SELECT
79+
package,
80+
COUNT(*) count
81+
FROM (
82+
SELECT
83+
REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
84+
id
85+
FROM (
86+
SELECT
87+
SPLIT(content, '\n') line,
88+
id
89+
FROM
90+
[bigquery-public-data:github_repos.sample_contents]
91+
WHERE
92+
content CONTAINS 'import'
93+
AND sample_path LIKE '%.java'
94+
HAVING
95+
LEFT(line, 6)='import' )
96+
GROUP BY
97+
package,
98+
id )
99+
GROUP BY
100+
1
101+
ORDER BY
102+
count DESC
103+
LIMIT
104+
40
105+
```
106+
107+
# Sample Screenshot
108+
109+
![Zeppelin BigQuery](https://cloud.githubusercontent.com/assets/10060731/16938817/b9213ea0-4db6-11e6-8c3b-8149a0bdf874.png)

bigquery/pom.xml

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
~ Licensed to the Apache Software Foundation (ASF) under one or more
4+
~ contributor license agreements. See the NOTICE file distributed with
5+
~ this work for additional information regarding copyright ownership.
6+
~ The ASF licenses this file to You under the Apache License, Version 2.0
7+
~ (the "License"); you may not use this file except in compliance with
8+
~ the License. You may obtain a copy of the License at
9+
~
10+
~ http://www.apache.org/licenses/LICENSE-2.0
11+
~
12+
~ Unless required by applicable law or agreed to in writing, software
13+
~ distributed under the License is distributed on an "AS IS" BASIS,
14+
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
~ See the License for the specific language governing permissions and
16+
~ limitations under the License.
17+
-->
18+
19+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
20+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
21+
<modelVersion>4.0.0</modelVersion>
22+
23+
<parent>
24+
<artifactId>zeppelin</artifactId>
25+
<groupId>org.apache.zeppelin</groupId>
26+
<version>0.7.0-SNAPSHOT</version>
27+
</parent>
28+
29+
<groupId>org.apache.zeppelin</groupId>
30+
<artifactId>zeppelin-bigquery</artifactId>
31+
<packaging>jar</packaging>
32+
<version>0.7.0-SNAPSHOT</version>
33+
<name>Zeppelin: BigQuery interpreter</name>
34+
<url>http://www.apache.org</url>
35+
36+
<dependencies>
37+
38+
<dependency>
39+
<groupId>com.google.apis</groupId>
40+
<artifactId>google-api-services-bigquery</artifactId>
41+
<version>v2-rev265-1.21.0</version>
42+
</dependency>
43+
<dependency>
44+
<groupId>com.google.oauth-client</groupId>
45+
<artifactId>google-oauth-client</artifactId>
46+
<version>${project.oauth.version}</version>
47+
</dependency>
48+
<dependency>
49+
<groupId>com.google.http-client</groupId>
50+
<artifactId>google-http-client-jackson2</artifactId>
51+
<version>${project.http.version}</version>
52+
</dependency>
53+
<dependency>
54+
<groupId>com.google.oauth-client</groupId>
55+
<artifactId>google-oauth-client-jetty</artifactId>
56+
<version>${project.oauth.version}</version>
57+
</dependency>
58+
<dependency>
59+
<groupId>com.google.code.gson</groupId>
60+
<artifactId>gson</artifactId>
61+
<version>2.6</version>
62+
</dependency>
63+
64+
<dependency>
65+
<groupId>org.apache.zeppelin</groupId>
66+
<artifactId>zeppelin-interpreter</artifactId>
67+
<version>${project.version}</version>
68+
<scope>provided</scope>
69+
</dependency>
70+
71+
<dependency>
72+
<groupId>org.slf4j</groupId>
73+
<artifactId>slf4j-api</artifactId>
74+
</dependency>
75+
76+
<dependency>
77+
<groupId>org.slf4j</groupId>
78+
<artifactId>slf4j-log4j12</artifactId>
79+
</dependency>
80+
81+
<dependency>
82+
<groupId>junit</groupId>
83+
<artifactId>junit</artifactId>
84+
<scope>test</scope>
85+
</dependency>
86+
</dependencies>
87+
88+
<properties>
89+
<project.http.version>1.21.0</project.http.version>
90+
<project.oauth.version>1.21.0</project.oauth.version>
91+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
92+
<bigquery.test.exclude>**/BigQueryInterpreterTest.java</bigquery.test.exclude>
93+
</properties>
94+
95+
<build>
96+
<plugins>
97+
<plugin>
98+
<artifactId>maven-enforcer-plugin</artifactId>
99+
<version>1.3.1</version>
100+
<executions>
101+
<execution>
102+
<id>enforce</id>
103+
<phase>none</phase>
104+
</execution>
105+
</executions>
106+
</plugin>
107+
108+
<plugin>
109+
<groupId>org.apache.maven.plugins</groupId>
110+
<artifactId>maven-surefire-plugin</artifactId>
111+
<configuration>
112+
<excludes>
113+
<exclude>${bigquery.test.exclude}</exclude>
114+
</excludes>
115+
</configuration>
116+
</plugin>
117+
118+
<plugin>
119+
<artifactId>maven-dependency-plugin</artifactId>
120+
<version>2.8</version>
121+
<executions>
122+
<execution>
123+
<id>copy-dependencies</id>
124+
<phase>package</phase>
125+
<goals>
126+
<goal>copy-dependencies</goal>
127+
</goals>
128+
<configuration>
129+
<outputDirectory>${project.build.directory}/../../interpreter/bqsql</outputDirectory>
130+
<overWriteReleases>false</overWriteReleases>
131+
<overWriteSnapshots>false</overWriteSnapshots>
132+
<overWriteIfNewer>true</overWriteIfNewer>
133+
<includeScope>runtime</includeScope>
134+
</configuration>
135+
</execution>
136+
<execution>
137+
<id>copy-artifact</id>
138+
<phase>package</phase>
139+
<goals>
140+
<goal>copy</goal>
141+
</goals>
142+
<configuration>
143+
<outputDirectory>${project.build.directory}/../../interpreter/bqsql</outputDirectory>
144+
<overWriteReleases>false</overWriteReleases>
145+
<overWriteSnapshots>false</overWriteSnapshots>
146+
<overWriteIfNewer>true</overWriteIfNewer>
147+
<includeScope>runtime</includeScope>
148+
<artifactItems>
149+
<artifactItem>
150+
<groupId>${project.groupId}</groupId>
151+
<artifactId>${project.artifactId}</artifactId>
152+
<version>${project.version}</version>
153+
<type>${project.packaging}</type>
154+
</artifactItem>
155+
</artifactItems>
156+
</configuration>
157+
</execution>
158+
</executions>
159+
</plugin>
160+
<plugin>
161+
<artifactId>maven-assembly-plugin</artifactId>
162+
<configuration>
163+
<archive>
164+
<manifest>
165+
<mainClass>
166+
org.apache.zeppelin.bigquery.BigQueryInterpreter
167+
</mainClass>
168+
</manifest>
169+
</archive>
170+
<descriptorRefs>
171+
<descriptorRef>jar-with-dependencies</descriptorRef>
172+
</descriptorRefs>
173+
</configuration>
174+
</plugin>
175+
</plugins>
176+
</build>
177+
</project>

0 commit comments

Comments
 (0)