Xtract

A Python library for extracting data from X (formerly Twitter) posts.

Version

Current version: 1.2.3

Features

Download post content including text, media, and metadata
Extract user information
Support for quoted posts
Save data as JSON
Generate Markdown summaries of posts
Command-line interface

Installation

# Basic installation
pip install xtract

# Install with development dependencies
pip install xtract[dev]

Or install from source:

# Clone the repository
git clone https://github.com/yourusername/xtract.git
cd xtract

# Basic installation
pip install -e .

# Install with development dependencies
pip install -e ".[dev]"

Dependencies

Core dependencies:
- requests>=2.31.0
- urllib3>=2.0.7
- certifi>=2023.7.22
- charset-normalizer>=3.3.2
- idna>=3.4
Development dependencies:
- pytest>=7.4.0
- pytest-cov>=4.1.0
- black>=23.7.0
- isort>=5.12.0
- mypy>=1.5.1
- flake8>=6.1.0

Usage

Downloading a post

You can download a post by providing either the tweet ID or the full URL:

from xtract import download_x_post

# Using tweet ID
post = download_x_post("1895573480835539451")

# Using URL
post = download_x_post("https://x.com/xuser/status/1895573480835539451")

Both methods will retrieve the same post. The URL format supports various variations:

https://x.com/username/status/ID
https://twitter.com/username/status/ID
URLs with query parameters
URLs with additional path segments

Converting Posts to Markdown

You can convert a post to Markdown format:

from xtract import download_x_post, post_to_markdown, save_post_as_markdown

# Download a post
post = download_x_post("1895573480835539451")

# Convert to Markdown string
markdown_content = post_to_markdown(post)
print(markdown_content)

# Skip metadata section if desired
markdown_without_metadata = post_to_markdown(post, include_metadata=False)

# Control stats and metadata separately
markdown_custom = post_to_markdown(post, include_stats=True, include_metadata=True)

# Save as Markdown file
markdown_path = save_post_as_markdown(post, output_dir="output")
print(f"Markdown saved to: {markdown_path}")

The Markdown output includes:

YAML frontmatter with metadata (tweet ID, author, statistics, etc.)
Post text and creation date
Author information
Post statistics (views, likes, retweets, etc.)
Links to images and videos
Quoted tweet content (if present, without metadata)

Example output format:

---
tweet_id: 1895573480835539451
author: xuser
display_name: X User
date: 2024-03-28 12:34:56
is_verified: True
image_count: 1
video_count: 0
views: 1234567
likes: 12345
retweets: 1234
replies: 123
quotes: 12
has_quoted_tweet: false
url: https://x.com/xuser/status/1895573480835539451
---

# Post by @xuser ✓
**X User** (@xuser) • 2024-03-28 12:34:56

This is the post text content...

Advanced Features

Token Expiration Handling

The library includes automatic token expiration handling with retry logic:

from xtract import download_x_post

# Default behavior automatically retries on token expiration
post = download_x_post("1895573480835539451")

# Control maximum number of retries for token expiration
post = download_x_post("1895573480835539451", max_retries=3)

# Disable retries
post = download_x_post("1895573480835539451", max_retries=0)

Quoted Tweet Support

Xtract properly handles and includes quoted tweets in the downloaded data:

from xtract import download_x_post

# Download a post with a quoted tweet
post = download_x_post("1895573480835539451")
post_dict = post.to_dict()

# Check if post contains a quoted tweet
if 'quoted_tweet' in post_dict:
    quoted_tweet = post_dict['quoted_tweet']
    print(f"Quoted tweet ID: {quoted_tweet['tweet_id']}")
    print(f"Quoted tweet author: {quoted_tweet['username']}")
    print(f"Quoted tweet text: {quoted_tweet['text']}")

The Markdown output includes:

YAML frontmatter with metadata (tweet ID, author, statistics, etc.)

Command Line Usage

# Basic usage with tweet ID
python -m xtract 1895573480835539451

# Using URL
python -m xtract https://x.com/xuser/status/1895573480835539451

# Save to custom directory
python -m xtract 1895573480835539451 --output-dir my_downloads

# Generate Markdown summary
python -m xtract 1895573480835539451 --markdown

# Save raw API response
python -m xtract 1895573480835539451 --save-raw

# Pretty-print JSON output to console
python -m xtract 1895573480835539451 --pretty

Project Structure

.
├── install.sh              # Quick installation script
├── pyproject.toml          # Project configuration
├── README.md               # Project documentation
├── setup.py                # Package installation configuration
├── test_xtract.py          # Simple test script
└── xtract/                 # Main package
    ├── __init__.py         # Package exports
    ├── cli.py              # Command-line interface
    ├── api/                # API interaction
    │   ├── __init__.py
    │   ├── client.py       # API client functions
    │   └── errors.py       # Custom exceptions
    ├── config/             # Configuration
    │   ├── __init__.py
    │   └── constants.py    # Constants and defaults
    ├── models/             # Data models
    │   ├── __init__.py
    │   ├── post.py         # Post and PostData classes
    │   └── user.py         # UserDetails class
    └── utils/              # Utilities
        ├── __init__.py
        ├── file.py         # File handling
        ├── markdown.py     # Markdown generation
        └── media.py        # Media processing

Running Tests

To verify the xtract library is working:

# Run the test script directly
python test_xtract.py

# Or use the install script which creates a venv and runs the test
./install.sh

The test script will:

Fetch a sample X post using the xtract library
Display the post details if successful
Save the post data to the x_post_downloads directory

License

MIT

Credits

Created by Eric Wu

Testing

This project uses pytest for testing. To run the tests:

# Run all tests
python -m pytest

# Run tests with coverage
python -m pytest --cov=xtract

# Generate HTML coverage report
python -m pytest --cov=xtract --cov-report=html

After running the HTML coverage report, you can view the results by opening htmlcov/index.html in your browser.

Development

To set up the development environment:

Clone the repository
Install the package with development dependencies: pip install -e ".[dev]"
Use the provided install.sh script for a quick setup

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/xtract		src/xtract
tests		tests
.cursor.json		.cursor.json
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
setup.py		setup.py
test_xtract.py		test_xtract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Xtract

Version

Features

Installation

Dependencies

Usage

Downloading a post

Converting Posts to Markdown

Advanced Features

Token Expiration Handling

Quoted Tweet Support

Command Line Usage

Project Structure

Running Tests

License

Credits

Testing

Development

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Xtract

Version

Features

Installation

Dependencies

Usage

Downloading a post

Converting Posts to Markdown

Advanced Features

Token Expiration Handling

Quoted Tweet Support

Command Line Usage

Project Structure

Running Tests

License

Credits

Testing

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages