jpt/spatula

hacktoberfest python3 scraping

Find a file

jpt c3e6d4c600 Some checks failed Test & Lint / build (3.10) (push) Has been cancelled Details Test & Lint / build (3.11) (push) Has been cancelled Details Test & Lint / build (3.12) (push) Has been cancelled Details Test & Lint / build (3.8) (push) Has been cancelled Details Test & Lint / build (3.9) (push) Has been cancelled Details move documentation		2025-11-22 11:47:43 -06:00
.github	ruff	2023-11-17 15:43:14 -06:00
docs	move documentation	2025-11-22 11:47:43 -06:00
src/spatula	promoting to 1.0.0	2025-10-31 17:39:32 -05:00
tests	fix tests for abc versions of Page and Source	2022-11-09 23:56:08 -06:00
.gitignore	promoting to 1.0.0	2025-10-31 17:39:32 -05:00
.pre-commit-config.yaml	fix tests for abc versions of Page and Source	2022-11-09 23:56:08 -06:00
Justfile	move documentation	2025-11-22 11:47:43 -06:00
LICENSE	Initial commit	2017-02-20 23:49:00 -05:00
mkdocs.yml	move documentation	2025-11-22 11:47:43 -06:00
pyproject.toml	move documentation	2025-11-22 11:47:43 -06:00
README.md	move documentation	2025-11-22 11:47:43 -06:00
trifold.toml	move documentation	2025-11-22 11:47:43 -06:00

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.

Features

Page-oriented design: Encourages writing understandable & maintainable scrapers.
Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
Fully Typed: Makes full use of Python 3 type annotations.