This application extracts text content from websites and converts it into both EPUB and PDF ebook formats. It was specifically designed to handle the frameset-based structure of wellsofgrace.com.
- Extracts text from frameset-based websites
- Handles multiple chapters automatically
- Converts content to EPUB format
- Converts content to PDF format with Chinese font support
- Supports Chinese text
- Retries failed requests automatically
- Cleans up content for better readability
- Table of contents generation for PDF
- Proper page formatting and styling
Install the required dependencies:
pip3 install -r requirements.txtExtract from the default URL and create EPUB:
python3 extract_ebook.pyCustom URL and output file:
python3 extract_ebook.py "https://example.com/book/index.htm" "my_book.epub"Extract from the default URL and create PDF:
python3 pdf_converter.pyCustom URL and output file:
python3 pdf_converter.py "https://example.com/book/index.htm" "my_book.pdf"Extract from the default URL and create both formats:
python3 convert_to_both.pyCustom URL and output name (will create .epub and .pdf files):
python3 convert_to_both.py "https://example.com/book/index.htm" "my_book"ebook_extractor.py- Main extraction class with all functionalityextract_ebook.py- Command-line interface for EPUB creationpdf_converter.py- Command-line interface for PDF creationconvert_to_both.py- Command-line interface for both formatsrequirements.txt- Python dependenciesdebug_fetch.py- Debug script for testing website fetching
- Frameset Detection: Detects if the website uses framesets and extracts frame sources
- Contents Frame: Identifies navigation frames and extracts chapter links
- Chapter Processing: Fetches each chapter and extracts text content
- EPUB Generation: Creates a properly formatted EPUB file with all chapters
The application is preconfigured to extract content from:
https://wellsofgrace.com/books/spiritual/rskndam/index.htm
This creates an EPUB of the book "《认识苦难的奥秘》" (Understanding the Mystery of Suffering).
The application creates ebook files in two formats:
Can be read with any standard ebook reader such as:
- Apple Books
- Adobe Digital Editions
- Calibre
- FBReader
- And many others
Can be read with any PDF reader such as:
- Adobe Acrobat Reader
- Preview (macOS)
- Built-in browser PDF viewers
- Mobile PDF readers
- And many others
- Proper Chinese font support
- Table of contents
- Chapter navigation
- Professional formatting
- Justified text alignment
The application includes robust error handling:
- Network timeouts and retries
- Empty content detection
- Alternative content extraction methods
- Graceful degradation for missing chapters