|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: "R Interpreter" |
| 4 | +description: "" |
| 5 | +group: manual |
| 6 | +--- |
| 7 | +{% include JB/setup %} |
| 8 | + |
| 9 | +## R Interpreter |
| 10 | + |
| 11 | +This is a the Apache (incubating) Zeppelin project, with the addition of support for the R programming language and R-spark integration. |
| 12 | + |
| 13 | +### Requirements |
| 14 | + |
| 15 | +Additional requirements for the R interpreter are: |
| 16 | + |
| 17 | + * R 3.1 or later (earlier versions may work, but have not been tested) |
| 18 | + * The `evaluate` R package. |
| 19 | + |
| 20 | +For full R support, you will also need the following R packages: |
| 21 | + |
| 22 | + * `knitr` |
| 23 | + * `repr` -- available with `devtools::install_github("IRkernel/repr")` |
| 24 | + * `htmltools` -- required for some interactive plotting |
| 25 | + * `base64enc` -- required to view R base plots |
| 26 | + |
| 27 | +### Configuration |
| 28 | + |
| 29 | +To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`. |
| 30 | + |
| 31 | +If it is not set, the R Interpreter will not be able to interface with Spark. |
| 32 | + |
| 33 | +You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up. |
| 34 | + |
| 35 | +### Using the R Interpreter |
| 36 | + |
| 37 | +By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`. |
| 38 | + |
| 39 | +`%r` will behave like an ordinary REPL. You can execute commands as in the CLI. |
| 40 | + |
| 41 | +[](screenshots/repl2plus2.png) |
| 42 | + |
| 43 | +R base plotting is fully supported |
| 44 | + |
| 45 | +[](screenshots/replhist.png) |
| 46 | + |
| 47 | +If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. |
| 48 | + |
| 49 | +[](screenshots/replhead.png) |
| 50 | + |
| 51 | +`%knitr` interfaces directly against `knitr`, with chunk options on the first line: |
| 52 | + |
| 53 | +[](screenshots/knitgeo.png) |
| 54 | +[](screenshots/knitstock.png) |
| 55 | +[](screenshots/knitmotion.png) |
| 56 | + |
| 57 | +The two interpreters share the same environment. If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`. |
| 58 | + |
| 59 | +### Using SparkR & Moving Between Languages |
| 60 | + |
| 61 | +If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically: |
| 62 | + |
| 63 | +[](screenshots/sparkrfaithful.png) |
| 64 | + |
| 65 | +The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`. |
| 66 | + |
| 67 | +The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters: |
| 68 | + |
| 69 | +[](screenshots/backtoscala.png) |
| 70 | + |
| 71 | +You can also make an ordinary R variable accessible in scala and Python: |
| 72 | + |
| 73 | +[](screenshots/varr1.png) |
| 74 | + |
| 75 | +And vice versa: |
| 76 | + |
| 77 | +[](screenshots/varscala.png) |
| 78 | +[](screenshots/varr2.png) |
| 79 | + |
| 80 | +### Caveats & Troubleshooting |
| 81 | + |
| 82 | +* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`. The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark. |
| 83 | + |
| 84 | +* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed. Use immutable variables. |
| 85 | + |
| 86 | +* (Note that `%spark.r` and `$r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group. |
| 87 | + |
| 88 | +* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation. |
| 89 | + |
| 90 | +* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system. |
| 91 | + |
| 92 | +* Why `knitr` Instead of `rmarkdown`? Why no `htmlwidgets`? In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc. This makes it many times slower than `knitr`, which can operate entirely in RAM. |
| 93 | + |
| 94 | +* Why no `ggvis` or `shiny`? Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task. |
| 95 | + |
| 96 | +* Max OS X & case-insensitive filesystem. If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory. |
| 97 | + |
| 98 | +* Error `unable to start device X11` with the repl interpreter. Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable. This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting. |
| 99 | + |
| 100 | +* akka Library Version or `TTransport` errors. This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled. |
| 101 | + |
| 102 | + |
| 103 | + |
| 104 | + |
| 105 | + |
| 106 | +## R Interpreter for Apache Zeppelin |
| 107 | + |
| 108 | +[R](https://www.r-project.org) is a free software environment for statistical computing and graphics. |
| 109 | + |
| 110 | +To run R code and visualize plots in Apache Zeppelin, you will need R on your master node (or your dev laptop). |
| 111 | + |
| 112 | ++ For Centos: `yum install R R-devel libcurl-devel openssl-devel` |
| 113 | ++ For Ubuntu: `apt-get install r-base` |
| 114 | + |
| 115 | +Validate your installation with a simple R command: |
| 116 | + |
| 117 | +``` |
| 118 | +R -e "print(1+1)" |
| 119 | +``` |
| 120 | + |
| 121 | +To enjoy plots, install additional libraries with: |
| 122 | + |
| 123 | +``` |
| 124 | ++ devtools with `R -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"` |
| 125 | ++ knitr with `R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org')"` |
| 126 | ++ ggplot2 with `R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"` |
| 127 | ++ Other vizualisation librairies: `R -e "install.packages(c('devtools','mplot', 'googleVis'), repos = 'http://cran.us.r-project.org'); require(devtools); install_github('ramnathv/rCharts')"` |
| 128 | +``` |
| 129 | + |
| 130 | +We recommend you to also install the following optional R libraries for happy data analytics: |
| 131 | + |
| 132 | ++ glmnet |
| 133 | ++ pROC |
| 134 | ++ data.table |
| 135 | ++ caret |
| 136 | ++ sqldf |
| 137 | ++ wordcloud |
| 138 | + |
0 commit comments