Skip to content

JonathanJing/Web2Ebook

Repository files navigation

Website to EPUB and PDF Converter

This application extracts text content from websites and converts it into both EPUB and PDF ebook formats. It was specifically designed to handle the frameset-based structure of wellsofgrace.com.

Features

  • Extracts text from frameset-based websites
  • Handles multiple chapters automatically
  • Converts content to EPUB format
  • Converts content to PDF format with Chinese font support
  • Supports Chinese text
  • Retries failed requests automatically
  • Cleans up content for better readability
  • Table of contents generation for PDF
  • Proper page formatting and styling

Requirements

Install the required dependencies:

pip3 install -r requirements.txt

Usage

EPUB Only

Extract from the default URL and create EPUB:

python3 extract_ebook.py

Custom URL and output file:

python3 extract_ebook.py "https://example.com/book/index.htm" "my_book.epub"

PDF Only

Extract from the default URL and create PDF:

python3 pdf_converter.py

Custom URL and output file:

python3 pdf_converter.py "https://example.com/book/index.htm" "my_book.pdf"

Both EPUB and PDF

Extract from the default URL and create both formats:

python3 convert_to_both.py

Custom URL and output name (will create .epub and .pdf files):

python3 convert_to_both.py "https://example.com/book/index.htm" "my_book"

Files

  • ebook_extractor.py - Main extraction class with all functionality
  • extract_ebook.py - Command-line interface for EPUB creation
  • pdf_converter.py - Command-line interface for PDF creation
  • convert_to_both.py - Command-line interface for both formats
  • requirements.txt - Python dependencies
  • debug_fetch.py - Debug script for testing website fetching

How It Works

  1. Frameset Detection: Detects if the website uses framesets and extracts frame sources
  2. Contents Frame: Identifies navigation frames and extracts chapter links
  3. Chapter Processing: Fetches each chapter and extracts text content
  4. EPUB Generation: Creates a properly formatted EPUB file with all chapters

Default Target

The application is preconfigured to extract content from: https://wellsofgrace.com/books/spiritual/rskndam/index.htm

This creates an EPUB of the book "《认识苦难的奥秘》" (Understanding the Mystery of Suffering).

Output

The application creates ebook files in two formats:

EPUB Format

Can be read with any standard ebook reader such as:

  • Apple Books
  • Adobe Digital Editions
  • Calibre
  • FBReader
  • And many others

PDF Format

Can be read with any PDF reader such as:

  • Adobe Acrobat Reader
  • Preview (macOS)
  • Built-in browser PDF viewers
  • Mobile PDF readers
  • And many others

Features of PDF Output

  • Proper Chinese font support
  • Table of contents
  • Chapter navigation
  • Professional formatting
  • Justified text alignment

Error Handling

The application includes robust error handling:

  • Network timeouts and retries
  • Empty content detection
  • Alternative content extraction methods
  • Graceful degradation for missing chapters

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages