|
1 | | -# Apache Zeppelin R |
| 1 | +#Zeppelin |
2 | 2 |
|
3 | | -This adds [R](http://cran.r-project.org) interpeter to the [Apache Zeppelin notebook](http://zeppelin.incubator.apache.org). |
| 3 | +**Documentation:** [User Guide](http://zeppelin.incubator.apache.org/docs/latest/index.html)<br/> |
| 4 | +**Mailing Lists:** [User and Dev mailing list](http://zeppelin.incubator.apache.org/community.html)<br/> |
| 5 | +**Continuous Integration:** [](https://travis-ci.org/apache/incubator-zeppelin) <br/> |
| 6 | +**Contributing:** [Contribution Guide](https://github.com/apache/incubator-zeppelin/blob/master/CONTRIBUTING.md)<br/> |
| 7 | +**Issue Tracker:** [Jira](https://issues.apache.org/jira/browse/ZEPPELIN)<br/> |
| 8 | +**License:** [Apache 2.0](https://github.com/apache/incubator-zeppelin/blob/master/LICENSE) |
4 | 9 |
|
5 | | -It supports: |
6 | 10 |
|
7 | | -+ R code. |
8 | | -+ SparkR code. |
9 | | -+ Cross paragraph R variables. |
10 | | -+ Scala to R binding (passing basic Scala data structure to R). |
11 | | -+ R to Scala binding (passing basic R data structure to Scala). |
12 | | -+ R plot (ggplot2...). |
| 11 | +**Zeppelin**, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. |
13 | 12 |
|
14 | | -## Simple R |
| 13 | +Core feature: |
| 14 | + * Web based notebook style editor. |
| 15 | + * Built-in Apache Spark support |
15 | 16 |
|
16 | | -[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/simple-r.png) |
17 | 17 |
|
18 | | -## Plot |
| 18 | +To know more about Zeppelin, visit our web site [http://zeppelin.incubator.apache.org](http://zeppelin.incubator.apache.org) |
19 | 19 |
|
20 | | -[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/plot.png) |
| 20 | +## Requirements |
| 21 | + * Java 1.7 |
| 22 | + * Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X |
| 23 | + * Maven (if you want to build from the source code) |
| 24 | + * Node.js Package Manager |
21 | 25 |
|
22 | | -## Scala R Binding |
| 26 | +## Getting Started |
23 | 27 |
|
24 | | -[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/scala-r.png) |
| 28 | +### Before Build |
| 29 | +If you don't have requirements prepared, install it. |
| 30 | +(The installation method may vary according to your environment, example is for Ubuntu.) |
25 | 31 |
|
26 | | -## R Scala Binding |
| 32 | +``` |
| 33 | +sudo apt-get update |
| 34 | +sudo apt-get install git |
| 35 | +sudo apt-get install openjdk-7-jdk |
| 36 | +sudo apt-get install npm |
| 37 | +sudo apt-get install libfontconfig |
| 38 | +
|
| 39 | +# install maven |
| 40 | +wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz |
| 41 | +sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/ |
| 42 | +sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn |
| 43 | +``` |
| 44 | + |
| 45 | +_Notes:_ |
| 46 | + - Ensure node is installed by running `node --version` |
| 47 | + - Ensure maven is running version 3.1.x or higher with `mvn -version` |
| 48 | + - Configure maven to use more memory than usual by ```export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"``` |
| 49 | + |
| 50 | +### Build |
| 51 | +If you want to build Zeppelin from the source, please first clone this repository, then: |
| 52 | + |
| 53 | +``` |
| 54 | +mvn clean package -DskipTests [Options] |
| 55 | +``` |
| 56 | + |
| 57 | +Each Interpreter requires different Options. |
| 58 | + |
| 59 | + |
| 60 | +#### Spark Interpreter |
| 61 | + |
| 62 | +To build with a specific Spark version, Hadoop version or specific features, define one or more of the following profiles and options: |
| 63 | + |
| 64 | +##### -Pspark-[version] |
| 65 | + |
| 66 | +Set spark major version |
| 67 | + |
| 68 | +Available profiles are |
| 69 | + |
| 70 | +``` |
| 71 | +-Pspark-1.6 |
| 72 | +-Pspark-1.5 |
| 73 | +-Pspark-1.4 |
| 74 | +-Pspark-1.3 |
| 75 | +-Pspark-1.2 |
| 76 | +-Pspark-1.1 |
| 77 | +-Pcassandra-spark-1.5 |
| 78 | +-Pcassandra-spark-1.4 |
| 79 | +-Pcassandra-spark-1.3 |
| 80 | +-Pcassandra-spark-1.2 |
| 81 | +-Pcassandra-spark-1.1 |
| 82 | +``` |
| 83 | + |
| 84 | +minor version can be adjusted by `-Dspark.version=x.x.x` |
| 85 | + |
| 86 | + |
| 87 | +##### -Phadoop-[version] |
| 88 | + |
| 89 | +set hadoop major version |
| 90 | + |
| 91 | +Available profiles are |
| 92 | + |
| 93 | +``` |
| 94 | +-Phadoop-0.23 |
| 95 | +-Phadoop-1 |
| 96 | +-Phadoop-2.2 |
| 97 | +-Phadoop-2.3 |
| 98 | +-Phadoop-2.4 |
| 99 | +-Phadoop-2.6 |
| 100 | +``` |
| 101 | + |
| 102 | +minor version can be adjusted by `-Dhadoop.version=x.x.x` |
| 103 | + |
| 104 | +##### -Pyarn (optional) |
27 | 105 |
|
28 | | -[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/r-scala.png) |
| 106 | +enable YARN support for local mode |
29 | 107 |
|
30 | | -## SparkR |
31 | 108 |
|
32 | | -[](https://raw.githubusercontent.com/datalayer/zeppelin-R/rscala/_Rimg/sparkr.png) |
| 109 | +##### -Ppyspark (optional) |
33 | 110 |
|
34 | | -# Prerequisite |
| 111 | +enable PySpark support for local mode |
35 | 112 |
|
36 | | -You need R available on the host running the notebook. |
37 | 113 |
|
38 | | -+ For Centos: `yum install R R-devel` |
39 | | -+ For Ubuntu: `apt-get install r-base r-cran-rserve` |
| 114 | +##### -Pvendor-repo (optional) |
40 | 115 |
|
41 | | -Install additional R packages: |
| 116 | +enable 3rd party vendor repository (cloudera) |
| 117 | + |
| 118 | + |
| 119 | +##### -Pmapr[version] (optional) |
| 120 | + |
| 121 | +For the MapR Hadoop Distribution, these profiles will handle the Hadoop version. As MapR allows different versions |
| 122 | +of Spark to be installed, you should specify which version of Spark is installed on the cluster by adding a Spark profile (-Pspark-1.2, -Pspark-1.3, etc.) as needed. For Hive, check the hive/pom.xml and adjust the version installed as well. The correct Maven |
| 123 | +artifacts can be found for every version of MapR at http://doc.mapr.com |
| 124 | + |
| 125 | +Available profiles are |
| 126 | + |
| 127 | +``` |
| 128 | +-Pmapr3 |
| 129 | +-Pmapr40 |
| 130 | +-Pmapr41 |
| 131 | +-Pmapr50 |
| 132 | +``` |
| 133 | + |
| 134 | + |
| 135 | +Here're some examples: |
42 | 136 |
|
43 | 137 | ``` |
44 | | -curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz -o /tmp/rscala_1.0.6.tar.gz |
45 | | -R CMD INSTALL /tmp/rscala_1.0.6.tar.gz |
46 | | -R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')" |
47 | | -R -e install.packages('knitr', repos = 'http://cran.us.r-project.org') |
| 138 | +# basic build |
| 139 | +mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark |
| 140 | +
|
| 141 | +# spark-cassandra integration |
| 142 | +mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests |
| 143 | +
|
| 144 | +# with CDH |
| 145 | +mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests |
| 146 | +
|
| 147 | +# with MapR |
| 148 | +mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests |
48 | 149 | ``` |
49 | 150 |
|
50 | | -You also need a compiled version of Spark 1.5.0. Download [the binary distribution](http://archive.apache.org/dist/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz) and untar to make it accessible in `/opt/spark` folder. |
51 | 151 |
|
52 | | -# Build and Run |
| 152 | +#### Ignite Interpreter |
53 | 153 |
|
54 | 154 | ``` |
55 | | -mvn clean install -Pspark-1.5 -Dspark.version=1.5.0 \ |
56 | | - -Dhadoop.version=2.7.1 -Phadoop-2.6 -Ppyspark \ |
57 | | - -Dmaven.findbugs.enable=false -Drat.skip=true -Dcheckstyle.skip=true \ |
58 | | - -DskipTests \ |
59 | | - -pl '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin' |
| 155 | +mvn clean package -Dignite.version=1.1.0-incubating -DskipTests |
60 | 156 | ``` |
61 | 157 |
|
| 158 | +#### Scalding Interpreter |
| 159 | + |
62 | 160 | ``` |
63 | | -SPARK_HOME=/opt/spark ./bin/zeppelin.sh |
| 161 | +mvn clean package -Pscalding -DskipTests |
64 | 162 | ``` |
65 | 163 |
|
66 | | -Go to [http://localhost:8080](http://localhost:8080) and test the `R Tutorial` note. |
| 164 | +### Configure |
| 165 | +If you wish to configure Zeppelin option (like port number), configure the following files: |
| 166 | + |
| 167 | +``` |
| 168 | +./conf/zeppelin-env.sh |
| 169 | +./conf/zeppelin-site.xml |
| 170 | +``` |
| 171 | +(You can copy ```./conf/zeppelin-env.sh.template``` into ```./conf/zeppelin-env.sh```. |
| 172 | +Same for ```zeppelin-site.xml```.) |
| 173 | + |
| 174 | + |
| 175 | +#### Setting SPARK_HOME and HADOOP_HOME |
| 176 | + |
| 177 | +Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option. |
| 178 | +If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh |
| 179 | +You can use any supported version of spark without rebuilding Zeppelin. |
| 180 | + |
| 181 | +``` |
| 182 | +# ./conf/zeppelin-env.sh |
| 183 | +export SPARK_HOME=... |
| 184 | +export HADOOP_HOME=... |
| 185 | +``` |
| 186 | + |
| 187 | +#### External cluster configuration |
| 188 | +Mesos |
| 189 | + |
| 190 | + # ./conf/zeppelin-env.sh |
| 191 | + export MASTER=mesos://... |
| 192 | + export ZEPPELIN_JAVA_OPTS="-Dspark.executor.uri=/path/to/spark-*.tgz" or SPARK_HOME="/path/to/spark_home" |
| 193 | + export MESOS_NATIVE_LIBRARY=/path/to/libmesos.so |
| 194 | + |
| 195 | +If you set `SPARK_HOME`, you should deploy spark binary on the same location to all worker nodes. And if you set `spark.executor.uri`, every worker can read that file on its node. |
| 196 | + |
| 197 | +Yarn |
| 198 | + |
| 199 | + # ./conf/zeppelin-env.sh |
| 200 | + export SPARK_HOME=/path/to/spark_dir |
| 201 | + |
| 202 | +### Run |
| 203 | + ./bin/zeppelin-daemon.sh start |
67 | 204 |
|
68 | | -## Get the image from the Docker Repository |
| 205 | + browse localhost:8080 in your browser. |
69 | 206 |
|
70 | | -For your convenience, [Datalayer](http://datalayer.io) provides an up-to-date Docker image for [Apache Zeppelin](http://zeppelin.incubator.apache.org), the WEB Notebook for Big Data Science. |
71 | 207 |
|
72 | | -In order to get the image, you can run with the appropriate rights: |
| 208 | +For configuration details check __./conf__ subdirectory. |
73 | 209 |
|
74 | | -`docker pull datalayer/zeppelin-rscala` |
| 210 | +### Package |
| 211 | +To package the final distribution including the compressed archive, run: |
75 | 212 |
|
76 | | -Run the Zeppelin notebook with: |
| 213 | + mvn clean package -Pbuild-distr |
77 | 214 |
|
78 | | -`docker run -it -p 2222:22 -p 8080:8080 -p 4040:4040 datalayer/zeppelin-rscala` |
| 215 | +To build a distribution with specific profiles, run: |
79 | 216 |
|
80 | | -and go to [http://localhost:8080](http://localhost:8080) to test the `R Tutorial` note. |
| 217 | + mvn clean package -Pbuild-distr -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark |
81 | 218 |
|
82 | | -# License |
| 219 | +The profiles `-Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark` can be adjusted if you wish to build to a specific spark versions, or omit support such as `yarn`. |
83 | 220 |
|
84 | | -Copyright 2015 Datalayer http://datalayer.io |
| 221 | +The archive is generated under _zeppelin-distribution/target_ directory |
85 | 222 |
|
86 | | -Licensed under the Apache License, Version 2.0 (the "License"); |
87 | | -you may not use this file except in compliance with the License. |
88 | | -You may obtain a copy of the License at |
| 223 | +###Run end-to-end tests |
| 224 | +Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser |
89 | 225 |
|
90 | | - http://www.apache.org/licenses/LICENSE-2.0 |
| 226 | + #assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override) |
| 227 | + mvn verify |
91 | 228 |
|
92 | | -Unless required by applicable law or agreed to in writing, software |
93 | | -distributed under the License is distributed on an "AS IS" BASIS, |
94 | | -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
95 | | -See the License for the specific language governing permissions and |
96 | | -limitations under the License. |
| 229 | + #or take care of starting\stoping zeppelin-server from packaged _zeppelin-distribuion/target_ |
| 230 | + mvn verify -P using-packaged-distr |
97 | 231 |
|
98 | | -[](http://cran.r-project.org) |
99 | 232 |
|
100 | | -[](http://zeppelin.incubator.apache.org) |
101 | 233 |
|
102 | | -[](http://datalayer.io) |
| 234 | +[](https://github.com/igrigorik/ga-beacon) |
0 commit comments