MA-INF-4223-DBDA-Lab/labs/WorkSheet-3.md at master · SmartDataAnalytics/MA-INF-4223-DBDA-Lab · GitHub

49 lines (34 loc) · 2.03 KB

COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF BONN

Lab Distributed Big Data Analytics

Worksheet-3: ML on Spark (Spark ML and BigDL)

Dr. Hajira Jabeen, Gezim Sejdiu, Denis Lukovnikov, Prof. Dr. Jens Lehmann

April 25, 2019

In this lab we are going to perform basic Spark ML and BigDL operations (described on “Spark Fundamentals II (ML on Spark)”).

IN CLASS

Setup
- Download Spark 2.2, unpack to /opt/spark (or anywhere)
- Set SPARK_HOME var to /opt/spark (or where it was unpacked to)
- Download BigDL 0.7, unpack anywhere
- Set BIGDL_HOME var to unpacked BigDL directory
- do pip install bigdl==0.7 somewhere
- download https://gist.github.com/lukovnikov/461d1165ea04317d2be6b66995ffa73c
- start jupyter using the script (must be marked as executable)
Implement PySpark-BigDL dummy linreg notebook.
Implement PySpark-BigDL mnist notebook.
Implement PySpark-BigDL mnist cnn notebook.

AT HOME

Reading:
- Read “Pattern Recognition and Machine Learning” by Bishop
- Read “Deep Learning” by Courville et al. (or check some blog posts/tutorials)
- Check out the MLlib programming guide
- Read the BigDL whitepaper
- Check out the BigDL programming guide
- Check out the tutorials (https://github.com/intel-analytics/BigDL-Tutorials/ ← Python)
Complete the notebooks
Convert the mnist_cnn notebook to use MLlib’s Pipeline API