Data Science with Docker on PyCharm
Quick Reference Guide for Project Setup
Prerequisites Checklist
Docker Desktop installed and running
PyCharm Professional (Docker support required)
Docker plugin enabled in PyCharm
Git installed (optional but recommended)
Phase 1: Initial Setup
1
Create Project Directory
Create a new directory for your Data Science project:
mkdir my-ds-project cd my-ds-project
💡 Choose a meaningful name that reflects your project's purpose
Create Dockerfile
Create a Dockerfile in your project root:
FROM python:3.11-slim # Set working directory WORKDIR /app # Install system
dependencies RUN apt-get update && apt-get install -y \ gcc \ g++ \ && rm -rf
/var/lib/apt/lists/* # Copy requirements first for better caching COPY
requirements.txt . # Install Python dependencies RUN pip install --no-cache-
dir -r requirements.txt # Copy project files COPY . . # Expose port for
Jupyter EXPOSE 8888 # Default command CMD ["jupyter", "lab", "--ip=0.0.0.0",
"--port=8888", "--no-browser", "--allow-root"]
Create requirements.txt
List your Python dependencies:
pandas==2.0.3 numpy==1.24.3 matplotlib==3.7.2 seaborn==0.12.2 scikit-
learn==1.3.0 jupyter==1.0.0 jupyterlab==4.0.5 plotly==5.15.0
tensorflow==2.13.0 # Add other packages as needed
💡 Pin versions for reproducibility, update as needed
4
Create docker-compose.yml
For easier container management:
version: '3.8' services: datascience: build: . ports: - "8888:8888" volumes: -
.:/app - ./data:/app/data - ./notebooks:/app/notebooks environment: -
JUPYTER_ENABLE_LAB=yes command: jupyter lab --ip=0.0.0.0 --port=8888 --no-
browser --allow-root
Phase 2: PyCharm Configuration
5
Open Project in PyCharm
Open PyCharm and select "Open" to open your project directory.
File → Open → Select your project folder
Choose "This Window" when prompted
Configure Docker Interpreter
Set up Python interpreter to use Docker:
File → Settings (Ctrl+Alt+S)
Project → Python Interpreter
Click gear icon → Add
Select "Docker Compose"
Choose your docker-compose.yml file
Service: datascience
Click OK
💡 This may take a few minutes on first setup as Docker builds the image
Configure Docker Services
Set up Docker integration:
View → Tool Windows → Services
Click + → Docker → Docker Compose
Select your docker-compose.yml
Services panel will show your containers
Phase 3: Project Structure Setup
8
Create Project Folders
Organize your project structure:
mkdir -p data/{raw,processed,external} mkdir -p
notebooks/{exploratory,modeling} mkdir -p
src/{data,features,models,visualization} mkdir tests mkdir docs
💡 This follows a common Data Science project structure
Create Initial Files
Set up basic project files:
README.md - Project description
.gitignore - Git ignore patterns
src/__init__.py - Make src a package
config.py - Configuration settings
Phase 4: Testing and Validation
10
Build and Test Container
Verify your setup works:
docker-compose build docker-compose up -d
Check if Jupyter is accessible at http://localhost:8888
💡 Check Docker logs if there are issues: docker-compose logs
11
Test PyCharm Integration
Verify PyCharm can run code in container:
Create a simple test script: test.py
Add: import pandas as pd; print(pd.__version__)
Right-click → Run
Verify it runs in Docker container
12
Configure Jupyter in PyCharm
Set up Jupyter notebook support:
File → Settings → Languages & Frameworks → Jupyter
Configure Jupyter server URL: http://localhost:8888
Get token from container logs if needed
Test by creating a new notebook
Final Checklist
Docker container builds and runs successfully
PyCharm uses Docker interpreter
Jupyter Lab accessible via browser
Python code executes in PyCharm using Docker
File changes sync between host and container
Remember: Always stop containers when not in use: docker-compose down