• Python 99.7%
  • Just 0.3%
Find a file
jpt c3e6d4c600
Some checks failed
Test & Lint / build (3.10) (push) Has been cancelled
Test & Lint / build (3.11) (push) Has been cancelled
Test & Lint / build (3.12) (push) Has been cancelled
Test & Lint / build (3.8) (push) Has been cancelled
Test & Lint / build (3.9) (push) Has been cancelled
move documentation
2025-11-22 11:47:43 -06:00
.github ruff 2023-11-17 15:43:14 -06:00
docs move documentation 2025-11-22 11:47:43 -06:00
src/spatula promoting to 1.0.0 2025-10-31 17:39:32 -05:00
tests fix tests for abc versions of Page and Source 2022-11-09 23:56:08 -06:00
.gitignore promoting to 1.0.0 2025-10-31 17:39:32 -05:00
.pre-commit-config.yaml fix tests for abc versions of Page and Source 2022-11-09 23:56:08 -06:00
Justfile move documentation 2025-11-22 11:47:43 -06:00
LICENSE Initial commit 2017-02-20 23:49:00 -05:00
mkdocs.yml move documentation 2025-11-22 11:47:43 -06:00
pyproject.toml move documentation 2025-11-22 11:47:43 -06:00
README.md move documentation 2025-11-22 11:47:43 -06:00
trifold.toml move documentation 2025-11-22 11:47:43 -06:00

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.

Source: https://codeberg.org/jpt/spatula/

Documentation: https://jpt.sh/projects/spatula/

Issues: https://codeberg.org/jpt/spatula/issues

PyPI badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.