| LICENSE | ||
| README.md | ||
| rss2text.py | ||
rss2text
A simple yet powerful RSS feed processor that extracts the latest unread article from one or more RSS feeds and formats it for social media posting. It maintains a cache of processed articles to avoid duplicates and supports optional hashtag processing.
Features
- Process multiple RSS feeds
- Cache-based tracking of processed articles
- Optional hashtag support:
- Extract categories from RSS feed entries as hashtags
- Add custom hashtags via command line
- Optional description output:
- Plain-text description for simple posts
- HTML-preserving description for fediverse posting
- Markdown-converted description for snac-friendly posting
- Markdown-formatted output suitable for social media
- Customizable cache directory location
- Optional command execution for new articles
Installation
Requirements
- Python 3
feedparserlibrary
Install the required library using pip:
pip install feedparser
Basic Setup
- Clone this repository
- Make the script executable:
chmod +x rss2text.py - Optionally, move it to your PATH (e.g.,
/usr/local/bin/rss2text)
Usage
Basic Usage
Process a feed without hashtags:
./rss2text.py https://example.com/feed.xml
This will output only the title and link of the oldest unread article:
**Article Title**
https://example.com/article-url
Include the Article Description
Use the -d or --include-description flag:
./rss2text.py -d https://example.com/feed.xml
Output will include the article description between the title and link:
**Article Title**
Description
https://example.com/article-url
Include the Article Description as HTML
Use the -D or --include-description-html flag:
./rss2text.py -D https://example.com/feed.xml
Output will preserve the feed's HTML markup between the title and link:
**Article Title**
<p>Description with <strong>formatting</strong></p>
https://example.com/article-url
Include the Article Description as Markdown
Use the -m or --include-description-markdown flag:
./rss2text.py -m https://example.com/feed.xml
Output will convert common HTML formatting into Markdown suitable for snac:
**Article Title**
Description with **formatting** and a link (https://example.com/more).
- First point
- Second point
https://example.com/article-url
Include Feed Categories as Hashtags
Use the -i or --include-tags flag:
./rss2text.py -i https://example.com/feed.xml
Output will include hashtags from the feed's categories:
**Article Title**
https://example.com/article-url
#Category1 #Category2 #Category3
Add Custom Hashtags
Use the -t or --tags option with comma-separated tags:
./rss2text.py -i --tags "Tech,Programming" https://example.com/feed.xml
Output will include both feed categories and your custom tags:
**Article Title**
https://example.com/article-url
#Category1 #Category2 #Tech #Programming
Custom Cache Directory
By default, the script stores its cache in a cache directory where it runs. You can specify a different location:
./rss2text.py --cache-dir /tmp/cache https://example.com/feed.xml
Multiple Feeds
Process multiple feeds in order (stops after finding the first unread article):
./rss2text.py -i --tags "Tech" feed1.xml feed2.xml feed3.xml
Run Command on New Article
Use the -x or --command option to specify a command to run if a new article is found. The output will be passed to the command as stdin:
./rss2text.py -x "your_command_here" https://example.com/feed.xml
For example, to use rss2text with a custom script:
./rss2text.py -x "my_script.sh" https://example.com/feed.xml
The command will be executed only if a new article is found, and the output of rss2text will be passed to the command.
Full Command Reference
usage: rss2text.py [-h] [--cache-dir CACHE_DIR] [--tags TAGS] [--include-tags] [--include-description | --include-description-html | --include-description-markdown] [--command COMMAND] feed_urls [feed_urls ...]
Extract title, link, and optional description/tags from RSS feeds for social media posting
positional arguments:
feed_urls One or more RSS feed URLs to process
options:
-h, --help Show this help message and exit
--cache-dir CACHE_DIR, -c CACHE_DIR
Directory to store the cache (default: "cache")
--tags TAGS, -t TAGS
Additional hashtags to include in the output (comma-separated)
--include-tags, -i Include hashtags in the output
--include-description, -d
Include the article description as plain text
--include-description-html, -D
Include the article description with HTML markup preserved
--include-description-markdown, -m
Include the article description as Markdown suitable for snac
--command COMMAND, -x COMMAND
Command to run if a new article is found, output will be passed as stdin
How It Works
- The script maintains a cache file for each feed URL in the specified cache directory
- When processing a feed:
- It checks for articles not present in the cache
- Sorts unread articles by date
- Takes the oldest unread article
- Adds it to the cache
- Formats the output
- If hashtags are enabled:
- Extracts categories from the feed entry
- Adds any custom hashtags specified on the command line
- Formats all hashtags (removes spaces, special characters)
- Sorts and deduplicates hashtags
- If plain-text description output is enabled:
- Extracts the entry summary/description
- Strips HTML markup and decodes HTML entities
- Inserts the cleaned text between the title and link
- If HTML description output is enabled:
- Extracts the entry summary/description
- Preserves the feed's HTML markup
- Inserts the HTML between the title and link
- If Markdown description output is enabled:
- Extracts the entry summary/description
- Converts common HTML markup into lightweight Markdown
- Emits links as plain URLs for broad snac compatibility
- Inserts the Markdown between the title and link
- The script exits with status code 0 after finding and processing the first unread article, or status code 1 if no new articles are found
Use Cases
- Automated social media posting
- RSS feed monitoring
- Content aggregation
- Personal news tracking
Automatic Blog Post Publishing with snac
You can use rss2text in combination with [snac](https://codeberg.org/grunfink/snac2) to automatically publish new blog posts from an RSS feed on your snac instance. Here’s an example command, as snac user:
/usr/local/bin/rss2text -m -x "/usr/local/bin/snac note /home/snac/snacposts/ testuser -" --cache-dir /home/snac/rss2text_cache/ https://example.com/feed.xml
This command does the following:
rss2textprocesses the specified RSS feed (in this case,https://example.com/feed.xml), extracting the latest unread article and formatting it for social media.- The output is piped to
snac, which publishes it as a new note on your snac instance under the specified account.
When used with a cron or similar, this setup enables automated updates from an RSS feed to your ActivityPub instance, making it easy to share new content with your followers.
License
This project is open source, under the BSD 3-Clause License. See the LICENSE file.
Contributing
Contributions are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.