📄 pdfsp

pdfsp is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.

🚀 Features

Extracts tabular data from PDFs using pdfplumber
Converts tables into pandas DataFrames
Saves output as .xlsx Excel files using openpyxl
Ensures column names are unique to prevent issues
Visualizes DataFrames with streamlit

📦 Installation

Make sure you're using Python 3.10 or newer, then install with:

pip install pdfsp -U

python script

# pdf.py
from pdfsp import extract_tables, Options

# Define extraction options
source_folder = "."
output_folder = "output"
combine_tables = True

options = Options(
    source_folder=source_folder,
    output_folder=output_folder,
    combine=combine_tables
)

# Run the table extraction
extract_tables(options)

From console / Terminal / Command Line

# Extract all tables from all PDF files in the current folder and save them to the current folder
pdfsp . .

# Extract and COMBINE large tables (spanning multiple pages) into single files, saved to the current folder
pdfsp . . --combine

# Extract and COMBINE tables, skipping the first row of each table (e.g., header rows)
pdfsp . . --combine --skiprows=1

# Extract all tables from PDF files in 'someFolder' and save them to 'SomeOutFolder'
pdfsp someFolder SomeOutFolder

# Extract all tables from 'some.pdf' and save them to the current folder
pdfsp some.pdf .

# Extract all tables from 'some.pdf' and save them to 'toThisFolder'
pdfsp some.pdf toThisFolder

=== 📊 Extraction Summary Report ===
✅ Successful Files: 3
   - pdfs/report1.pdf → 🗂️ 5 tables extracted
   - pdfs/summary2.pdf → 🗂️ 3 tables extracted
   - pdfs/report2.pdf → 🗂️ 7 tables extracted

❌ Failed Files: 1
   - pdfs/corrupted.pdf

⚠️ Some files failed to process. See details above.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
docs		docs
pdfsp		pdfsp
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
example.pdf		example.pdf
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tox.ini		tox.ini
uv.lock		uv.lock
uv_setup.py		uv_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 pdfsp

🚀 Features

📦 Installation

python script

From console / Terminal / Command Line

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 pdfsp

🚀 Features

📦 Installation

python script

From console / Terminal / Command Line

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages