What is Docker?
Docker is a containerization platform that eases the
development, deployment of applications relatively. To put it in
a very simplistic way, docker is software that you run to
simulate the same environment that your code would run in
PROD in your local machine.
Docker on Spark
Before we get started, we need to understand some Docker
terminologies.
1. Registry: It's like the central repo for all your docker
images from where you can download the docker
image. Docker Hub is one such example. We can also set up
a private Docker registry which won’t be publicly available.
We can pull ie download an image from or push an image to
the registry.
2. image: It is basically a blueprint on what constitutes your
Docker container. For example, to deploy a Spark cluster you
might wanna start with base Linux, install java and stuff like
that. All of these requirements are baked into as an image
that can be pulled from the registry or created locally from
your Dockerfile.
3. container: Well, as per Docker’s documentation it is A
standardized unit of software. It is an instance of an image.
Basically, Container is like a lightweight, isolated virtual
machine(not exactly but a good analogy). There are a couple
of cool features of Linux ie namespace and cgroups that Docker
utilizes to provide us an isolated environment to run our
Light-weight images.
4. Dockerfile: Its a text file like a script which contains
detailed instructions of commands you wanna run, things
that you wanna download and stuff like that. We will be
writing 1 of these by the end of this article.
Now that we know, some basic definitions. It’s time we ask the
main question! Why do I care?
There are many reasons you might wanna use Docker. I will give
my perspective on why I started to learn about Docker.
I had to test my Kafka producers and consumers locally instead
of deploying my code in DEV/QA even before I was sure things
are working fine but also be sure that the same code, when
deployed in other environments, should behave the same.