Simple PySpark Script - Create and Run

This document explains how to create and run a simple PySpark script in batch mode, which can be useful for scheduled ETL processes. It details the steps to write a script that reads data and writes records to HDFS, as well as how to execute it using the spark-submit command. Additionally, it highlights the importance of logging and passing parameters during execution for better debugging and management.

Uploaded by

roshanrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views2 pages

Simple PySpark Script - Create and Run

Uploaded by

roshanrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Your first simple PySpark Script – Create and Run

In this post, we will see how you can create your first PySpark script and then run it in
batch mode.

Many people I have seen use notebooks like Jupyter, Zeppelin however you may want to
create pyspark script and run it as per schedule.

This is especially helpful if you want to run ETL like process using PySpark which runs on
fixed schedule.

How to write PySpark Script

Let's create a simple PySpark script which will read data from some path and will write
first 10 records into HDFS. The script will also show you how to create other dummy
function along with main function. We will see how you can call other function inside
main function and print some information as well.

Save the file as "run_sample_pyspark.py"

How to run PySpark Script

You can run the pyspark script using spark-submit. spark-submit is used to run or submit
pyspark applications in the cluster. You may also want to create a dedicated LOG file for
this script execution. Use below command to run the pyspark script we created above on
the cluster.

spark-submit filename
The above statement will run the PySpark script in the background by calling spark-
submit. It also creates a log file in which you can see all the print statement output and
other spark log info. We have set logging level to ERROR in the above script. You can
change it to INFO, DEBUG,WARNING as well.
You can also pass parameters in the spark-submit command and also set spark level
configuration as command-line arguments. Below is one sample example of how to
execute PySpark script.

As part of this post, I wanted to show you how easily you can create your first pyspark
script and run in the cluster.

Summary
We saw how easy it is to create a pyspark script. We also saw how you can create
multiple functions in the same script and call one from another. You can create a single
method "main" and put all logic in it, though I will not encourage you to do so.

For execution of pyspark script, you have to pass the script to spark-submit which will
take care of execution of logic in pyspark. Also you can create dedicated log file for each
run for easy reference and debugging at later time.

PySpark Quick Guide
No ratings yet
PySpark Quick Guide
25 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
RDD Programming Guide - Spark 3.5.5 Documentation
No ratings yet
RDD Programming Guide - Spark 3.5.5 Documentation
14 pages
PySpark Tutorial for Beginners
No ratings yet
PySpark Tutorial for Beginners
206 pages
Pyspark
No ratings yet
Pyspark
10 pages
Pyspark
No ratings yet
Pyspark
4 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
How To Run Python Scripts
No ratings yet
How To Run Python Scripts
1 page
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Databricks PySpark Module1
No ratings yet
Databricks PySpark Module1
2 pages
PySpark SparkSession Guide
No ratings yet
PySpark SparkSession Guide
63 pages
Spark Labs for Data Engineers
No ratings yet
Spark Labs for Data Engineers
133 pages
Datashark Docs
No ratings yet
Datashark Docs
6 pages
Introduction To Apache Spark and PySpark
No ratings yet
Introduction To Apache Spark and PySpark
4 pages
Py Spark Final
No ratings yet
Py Spark Final
1 page
Pyspark - Notes 1
No ratings yet
Pyspark - Notes 1
3 pages
Big Data Analytics Digital Assignment 1
No ratings yet
Big Data Analytics Digital Assignment 1
5 pages
Week8 Manual
No ratings yet
Week8 Manual
35 pages
Submitting PySpark Apps on AWS EMR
No ratings yet
Submitting PySpark Apps on AWS EMR
7 pages
PySpark Interview Questions Shubham
0% (1)
PySpark Interview Questions Shubham
3 pages
Freedium - cfd-PySpark Interview Questions
No ratings yet
Freedium - cfd-PySpark Interview Questions
17 pages
PySpark Interview Questions Big Data
No ratings yet
PySpark Interview Questions Big Data
8 pages
Pyspark Cheat Sheet PDF
No ratings yet
Pyspark Cheat Sheet PDF
1 page
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
Pyspark Interview Questions To Strengthen Your Preparation
No ratings yet
Pyspark Interview Questions To Strengthen Your Preparation
8 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Spark Basic Info
No ratings yet
Spark Basic Info
11 pages
29 PDFsam Apache Spark Tutorial
No ratings yet
29 PDFsam Apache Spark Tutorial
7 pages
My Pyspark Practice Notes
100% (1)
My Pyspark Practice Notes
63 pages
PYSPARK Interview Questions
100% (4)
PYSPARK Interview Questions
126 pages
Overview of PySpark Components
No ratings yet
Overview of PySpark Components
9 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Data Engineer
No ratings yet
Data Engineer
12 pages
Scala REPL and Apache Spark Overview
No ratings yet
Scala REPL and Apache Spark Overview
11 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Learn How To Run Your Python Script - Python Script - Great Learning
No ratings yet
Learn How To Run Your Python Script - Python Script - Great Learning
14 pages
Data Engineers Cheat Sheet - 21 Must-Know PySpark Questions
No ratings yet
Data Engineers Cheat Sheet - 21 Must-Know PySpark Questions
16 pages
Pyspark Dataframe Interview Questions
No ratings yet
Pyspark Dataframe Interview Questions
7 pages
Apache Spark
No ratings yet
Apache Spark
162 pages
Espresso
No ratings yet
Espresso
4 pages
Monitoring & Logging in PySpark
No ratings yet
Monitoring & Logging in PySpark
3 pages
PySpark Notes
No ratings yet
PySpark Notes
190 pages
PySpark Core Concepts & Interview Prep
No ratings yet
PySpark Core Concepts & Interview Prep
8 pages
Py Spark
No ratings yet
Py Spark
177 pages
Python Notes
No ratings yet
Python Notes
76 pages
Pyspark Interview: Abhinav Singh
No ratings yet
Pyspark Interview: Abhinav Singh
275 pages
Python Command Line Execution Guide
No ratings yet
Python Command Line Execution Guide
3 pages
PySpark Notes
No ratings yet
PySpark Notes
64 pages
Pyspark 1
No ratings yet
Pyspark 1
4 pages
Spark DataFrame Basics
No ratings yet
Spark DataFrame Basics
10 pages
Spark DataFrame Basics
No ratings yet
Spark DataFrame Basics
10 pages
Introduction to PySpark Features and Benefits
No ratings yet
Introduction to PySpark Features and Benefits
13 pages
Py Spark
No ratings yet
Py Spark
9 pages
10 SparkBasics
No ratings yet
10 SparkBasics
45 pages
Spark-Tutorial - IV - Python
No ratings yet
Spark-Tutorial - IV - Python
212 pages
A Crash Course in Python For Data Science
No ratings yet
A Crash Course in Python For Data Science
30 pages
PySpark Installation and Basics Guide
100% (1)
PySpark Installation and Basics Guide
131 pages
2.6.1 Scripts: Python Scientific Lecture Notes, Release 2012.3 (Euroscipy 2012)
No ratings yet
2.6.1 Scripts: Python Scientific Lecture Notes, Release 2012.3 (Euroscipy 2012)
1 page
Resilient Distributed Dataset (RDD)
No ratings yet
Resilient Distributed Dataset (RDD)
10 pages
Linux & Shell Scripting Guide
No ratings yet
Linux & Shell Scripting Guide
6 pages
SQL Queries for Data Analysis
No ratings yet
SQL Queries for Data Analysis
3 pages
SSIS Interview Questions and Answers
No ratings yet
SSIS Interview Questions and Answers
4 pages

Simple PySpark Script - Create and Run

Uploaded by

Simple PySpark Script - Create and Run

Uploaded by

Your first simple PySpark Script – Create and Run

How to write PySpark Script

Save the file as "run_sample_pyspark.py"

How to run PySpark Script

You might also like