jugad2 - Vasudev Ram on software innovation: HTML-to-PDF

Showing posts with label HTML-to-PDF. Show all posts

Saturday, March 7, 2015

PDFCrowd and its HTML to PDF API (for Python and other languages)

PDFcrowd is a web service that I came across recently. It allows users to convert HTML content to PDF. This can be done both via the PDFcrowd site - by entering either the content or the URL of an HTML page to be converted to PDF - or via the PDFcrowd API, which has support for multiple programming languages, including for Python. I tried multiple approaches, and all worked fairly well.

A slightly modified version of a simple PDFcrowd API example from their site, is shown below.

# Demo program to show how to use the PDFcrowd API
# to convert HTML content to PDF.
# Author: Vasudev Ram - www.dancingbison.com

import pdfcrowd

try:
    # create an API client instance
    # Dummy credentials used; to actually run the program, enter your own.
    client = pdfcrowd.Client("user_name", "api_key")
    client.setAuthor('author_name')
    # Dummy credentials used; to actually run the program, enter your own.
    client.setUserPassword('user_password')

    # Convert a web page and store the generated PDF in a file.
    pdf = client.convertURI('http://www.dancingbison.com')
    with open('dancingbison.pdf', 'wb') as output_file:
        output_file.write(pdf)
    
    # Convert a web page and store the generated PDF in a file.
    pdf = client.convertURI('http://jugad2.blogspot.in/p/about-vasudev-ram.html')
    with open('jugad2-about-vasudevram.pdf', 'wb') as output_file:
        output_file.write(pdf)

    # convert an HTML string and save the result to a file
    output_file = open('html.pdf', 'wb')
    html = "My Small HTML File"
    client.convertHtml(html, output_file)
    output_file.close()

except pdfcrowd.Error, why:
    print 'Failed:', why

I used three calls to the API. For the first two calls, the inputs were: 1) my web site, 2) the about page of my blog.

Screenshots of the results of those two calls are below. You can see that they correspond closely to the originals.

Screenshot of generated PDF of dancingbison.com site

Screenshot of generated PDF of About Vasudev Ram page on jugad2.blogspot.com blog

- Vasudev Ram - Online Python training and programming

Dancing Bison Enterprises

Signup to hear about new Python or PDF related products created by me.

Posts about Python Posts about xtopdf

Contact Page

Share |

Wednesday, January 28, 2015

HTML text to PDF with Beautiful Soup and xtopdf

By Vasudev Ram

Recently, I thought of getting the text from HTML documents and putting that text to PDF. So I did it :)

Here's how:

"""
HTMLTextToPDF.py
A demo program to show how to convert the text extracted from HTML 
content, to PDF. It uses the Beautiful Soup library, v4, to 
parse the HTML, and the xtopdf library to generate the PDF output.
Beautiful Soup is at: http://www.crummy.com/software/BeautifulSoup/
xtopdf is at: https://bitbucket.org/vasudevram/xtopdf
Guide to using and installing xtopdf: http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html
Author: Vasudev Ram - http://www.dancingbison.com
Copyright 2015 Vasudev Ram
"""

import sys
from bs4 import BeautifulSoup
from PDFWriter import PDFWriter

def usage():
    sys.stderr.write("Usage: python " + sys.argv[0] + " html_file pdf_file\n")
    sys.stderr.write("which will extract only the text from html_file and\n")
    sys.stderr.write("write it to pdf_file\n")

def main():

    # Create some HTML for testing conversion of its text to PDF.
    html_doc = """
    <html>
        <head>
            <title>
            Test file for HTMLTextToPDF
            </title>
        </head>
        <body>
        This is text within the body element but outside any paragraph.
        <p>
        This is a paragraph of text. Hey there, how do you do?
        The quick red fox jumped over the slow blue cow.
        </p>
        <p>
        This is another paragraph of text.
        Don't mind what it contains.
        What is mind? Not matter.
        What is matter? Never mind.
        </p>
        This is also text within the body element but not within any paragraph.
        </body>
    </html>
    """

    pw = PDFWriter("HTMLTextTo.pdf")
    pw.setFont("Courier", 10)
    pw.setHeader("Conversion of HTML text to PDF")
    pw.setFooter("Generated by xtopdf: http://slid.es/vasudevram/xtopdf")
 
    # Use method chaining this time.
    for line in BeautifulSoup(html_doc).get_text().split("\n"):
        pw.writeLine(line)
    pw.savePage()
    pw.close()

if __name__ == '__main__':
    main()

The program uses the Beautiful Soup library for parsing and extracting information from HTML, and xtopdf, my Python library for PDF generation.
Run it with:

python HTMLTextToPDF.py

and the output will be in the file HTMLTextTo.pdf.
Screenshot below:

- Vasudev Ram - Python training and programming - Dancing Bison Enterprises

Read more of my posts about Python or read posts about xtopdf (latter is subset of former)

Signup to hear about my new software products or services.

Contact Page

Share |

Sunday, November 25, 2012

pisa / xhtml2pdf, converter written in Python

By Vasudev Ram

I had come across pisa some time ago and saw it again today. It is now called xhtml2pdf. It is written in Python and can be used to convert HTML/CSS into PDF.

Here is a list of some xhtml2pdf features, from the site:

- Translates HTML and CSS input into PDF files

- Written in pure Python and therefore platform independent

- Supports document specifics like columns, headers, footers, page numbers, custom Postscript and TrueType fonts, etc.

- Support for frameworks like Django, Turbogears, CherryPy, Pylons, WSGI

- Simple integration into Python programms

- Also usable as stand alone command line tool for Windows, MacOS X and Linux (binaries not available)

Looks useful ...

xhtml2pdf: HTML/CSS to PDF converter written in Python

xhtml2pdf is free, for commercial and non commercial use, says the site.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Sunday, August 19, 2012

DocRaptor, HTML to PDF convertor

By Vasudev Ram

DocRaptor is a tool for HTML to PDF conversion.
It supports HTTP POST requests using C#, Curl, jQuery, Node.js, PHP, Prototype.js, Python, Ruby, and Rails.

DocRaptor examples

DocRaptor Python examples

- Vasudev Ram - Dancing Bison Enterprises

Share |

jugad2 - Vasudev Ram on software innovation

Pages

Saturday, March 7, 2015

PDFCrowd and its HTML to PDF API (for Python and other languages)

Wednesday, January 28, 2015

HTML text to PDF with Beautiful Soup and xtopdf

Sunday, November 25, 2012

pisa / xhtml2pdf, converter written in Python

Sunday, August 19, 2012

DocRaptor, HTML to PDF convertor

Blog Archive

Labels