jugad2 - Vasudev Ram on software innovation: pdf

Managed WordPress Hosting by FlyWheel

Share |

Tuesday, December 6, 2016

[xtopdf] Wildcard text files to PDF with xtopdf and glob

By Vasudev Ram

First joker card image attribution

This is another in my series of applications that use xtopdf (source), my Python toolkit for PDF creation from other formats.

[ Here's a good overview of xtopdf, its uses, supported platforms and formats, etc. ]

I called this app WildcardTextToPDFpy.. It lets you specify a wildcard for text files, and then converts each of the text files matching the wildcard (like grades*2016.txt or monthly*sales.txt) [1], into corresponding PDF files - with the same names but with '.pdf' appended.

[1] For example, the wildcard grades*2016.txt could match grades-math-2016.txt and grades-bio-2016.txt (think a school or college), while monthly*sales.txt might match monthly-car-sales.txt and monthly-bike-sales.txt (think a vehicle dealership), so a PDF file will be generated for each text file matching the given wildcard.

The program uses the iglob function from the glob module in Python's standard library, similar to how this recent other post:

Simple directory lister with multiple wildcard arguments

used the glob function. The difference is that glob returns a list, while iglob returns a generator, so it will return the matching filenames lazily, on demand, as shown here:

$ python
>>> g1 = glob.glob('text*.txt')
>>> g1
['text1.txt', 'text2.txt', 'text3.txt']
>>>>>> g2 = glob.iglob('text*.txt')
>>> g2
<generator object iglob at 0x027C2850>
>>> next(g2)
'text1.txt'
>>> for f in g2:
...     print f
...
text2.txt
text3.txt

Here is the code for WildcardTextToPDF.py:

from __future__ import print_function

# WildcardTextToPDF.py
# Convert the text files specified by a filename wildcard,
# like '*.txt' or 'foo*bar*baz.txt', to PDF files.
# Each text file's content goes to a separate PDF file, with the 
# PDF file name being the full text file name (including the 
# '.txt' part), with '.pdf' appended.
# Requires:
# - xtopdf: https://bitbucket.org/vasudevram/xtopdf
# - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Product store: https://gumroad.com/vasudevram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com

import sys
import os
import glob
from PDFWriter import PDFWriter

def usage(argv):
    sys.stderr.write("Usage: python {} txt_filename_pattern\n".format(argv[0]))
    sys.stderr.write("E.g. python {} foo*.txt\n".format(argv[0]))

def text_to_pdf(txt_filename):
    pw = PDFWriter(txt_filename + '.pdf')
    pw.setFont('Courier', 12)
    pw.setHeader('{} converted to PDF'.format(txt_filename))
    pw.setFooter('PDF conversion by xtopdf: https://google.com/search?q=xtopdf')

    with open(txt_filename) as txt_fil:
        for line in txt_fil:
            pw.writeLine(line.strip('\n'))
        pw.savePage()

def main():
    if len(sys.argv) != 2:
        usage(sys.argv)
        sys.exit(0)

    try:
        for filename in glob.glob(sys.argv[1]):
            print("Converting {} to {}".format(filename, filename + '.pdf'))
            text_to_pdf(filename)
    except Exception as e:
        print("Caught Exception: type: {}, message: {}".format(\
            e.__class__, str(e)))

if __name__ == '__main__':
    main()

And here are the relevant files before and after running the program, with the program's output in between:

$ dir text*.txt/b
text1.txt
text2.txt
text3.txt

$ python WildcardTextToPDF2.py text*.txt
Converting text1.txt to text1.txt.pdf
Converting text2.txt to text2.txt.pdf
Converting text3.txt to text3.txt.pdf

$ dir text?.txt*/od/b
text1.txt
text2.txt
text3.txt
text1.txt.pdf
text2.txt.pdf
text3.txt.pdf

Finally, here's a cropped screenshot of the third output file, text3.txt.pdf, in Foxit PDF Reader (a lightweight PDF reader that I use):

Also look up:

[xtopdf] Batch convert text files to PDF (with xtopdf and fileinput)

for another variation on the approach to converting text files to PDF.

Speaking of generators, also check out this other post about them:

Python generators are pluggable

The image at the top of the post is of the earliest Joker card by Samuel Hart c. 1863, according to Wikipedia:

Joker (playing card)

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

FlyWheel - Managed WordPress Hosting

Share |

Wednesday, October 26, 2016

Read from CSV with D, write to PDF with Python

By Vasudev Ram

CSV => PDF

Here is another in my series of applications of xtopdf, my PDF creation toolkit for Python (xtopdf source here).

This xtopdf application is actually a pipeline (nothing Unix-specific though, will work on both *nix and Windows) - a D program reading CSV data and sending it to a Python program, which writes the data to PDF.

The D program, read_csv.d, reads CSV data from a .csv file, and writes it to standard output.

The Python program, StdinToPDF.py (which is part of the xtopdf toolkit), reads its standard input (which is redirected by the pipeline to come from the D program's standard output) and writes the data it reads, to PDF.

Here is the D program, read_csv.d:

/**************************************************
File: read_csv.d
Purpose: A program to read CSV data from a file and 
write it to standard output.
Author: Vasudev Ram
Date created: 2016-10-25
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
**************************************************/

import std.algorithm;
import std.array;
import std.csv;
import std.stdio;
import std.file;
import std.typecons;

int main()
{
    try {
        stderr.writeln("Reading CSV data from file.");
        auto file = File("input.csv", "r");
        foreach (record;
            file.byLine.joiner("\n").csvReader!(Tuple!(string, string, int)))
        {
            writefln("%s works as a %s and earns $%d per year",
                     record[0], record[1], record[2]);
        }
    } catch (CSVException csve) {
        stderr.writeln("Caught CSVException: msg = ", csve.msg, 
        " at row, col = ", csve.row, ", ", csve.col);
    } catch (FileException fe) {
        stderr.writeln("Caught FileException: msg = ", fe.msg);
    } catch (Exception e) {
        stderr.writeln("Caught Exception: msg = ", e.msg);
    }
    return 0;
}

The D program is compiled as usual with:

dmd read_csv.d

I ran it first (only the D program) with an invalid CSV file (it has an extra comma at the start on line 3, which invalidates the data by making "Driver" be in the salary column position), and got the expected error message, which includes the row and column number of the place in the CSV file where the program encountered the error - this is useful for fixing the input data:

$ type input.csv
Jack,Carpenter,40000
Tom,Blacksmith,50000
,Jill,Driver,60000
$ read_csv
Reading CSV data from file.
Jack works as a Carpenter and earns $40000 per year
Tom works as a Blacksmith and earns $50000 per year
Caught CSVException: msg = Unexpected 'D' when converting from type string to type int 
at row, col = 3, 3

Then I ran it again, in the regular way, this time with a valid CSV file, and as part of a pipeline, the other pipeline component being StdinToPDF:

$ read_csv | python StdinToPDF.py csv_output.pdf
Reading CSV data from file.

And here is a cropped view of the output as seen in Foxit PDF Reader:

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

FlyWheel - Managed WordPress Hosting

Share |

Thursday, September 29, 2016

Publish Peewee ORM data to PDF with xtopdf

By Vasudev Ram

Peewee => PDF

Peewee is a small, expressive ORM for Python, created by Charles Leifer.

After trying out Peewee a bit, I thought of writing another application of xtopdf (my Python toolkit for PDF creation), to publish Peewee data to PDF. I used an SQLite database underlying the Peewee ORM, but it also supports MySQL and PostgreSQL, per the docs. Here is the program, in file PeeweeToPDF.py:

# PeeweeToPDF.py
# Purpose: To show basics of publishing Peewee ORM data to PDF.
# Requires: Peewee ORM and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from peewee import *
from PDFWriter import PDFWriter

def print_and_write(pw, s):
    print s
    pw.writeLine(s)

# Define the database.
db = SqliteDatabase('contacts.db')

# Define the model for contacts.
class Contact(Model):
    name = CharField()
    age = IntegerField()
    skills = CharField()
    title = CharField()

    class Meta:
        database = db

# Connect to the database.
db.connect() 

# Drop the Contact table if it exists.
db.drop_tables([Contact])

# Create the Contact table.
db.create_tables([Contact])

# Define some contact rows.
contacts = (
    ('Albert Einstein', 22, 'Science', 'Physicist'),
    ('Benjamin Franklin', 32, 'Many', 'Polymath'),
    ('Samuel Johnson', 42, 'Writing', 'Writer')
)

# Save the contact rows to the contacts table.
for contact in contacts:
    c = Contact(name=contact[0], age=contact[1], \
    skills=contact[2], title=contact[3])
    c.save()

sep = '-' * (20 + 5 + 10 + 15)

# Publish the contact rows to PDF.
with PDFWriter('contacts.pdf') as pw:
    pw.setFont('Courier', 12)
    pw.setHeader('Demo of publishing Peewee ORM data to PDF')
    pw.setFooter('Generated by xtopdf: slides.com/vasudevram/xtopdf')
    print_and_write(pw, sep)
    print_and_write(pw, 
        "Name".ljust(20) + "Age".center(5) + 
        "Skills".ljust(10) + "Title".ljust(15))
    print_and_write(pw, sep)

    # Loop over all rows queried from the contacts table.
    for contact in Contact.select():
        print_and_write(pw, 
            contact.name.ljust(20) + 
            str(contact.age).center(5) + 
            contact.skills.ljust(10) + 
            contact.title.ljust(15))
    print_and_write(pw, sep)

# Close the database connection.
db.close()

I could have used Python's namedtuple feature instead of tuples, but did not do it for this small program.

I ran the program with:

python PeeweeToPDF.py

Here is a screenshot of the output as seen in Foxit PDF Reader (click image to enlarge):

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

FlyWheel - Managed WordPress Hosting

Share |

Wednesday, September 14, 2016

Func-y D + Python pipeline to generate PDF

By Vasudev Ram

Hi, readers,

Here is a pipeline that generates some output from a D (language) program and passes it to a Python program that converts that output to PDF. The D program makes use of a bit of (simple) template programming / generics and a bit of functional programming (using the std.functional module from Phobos, D's standard library).

D => Python

I'm showing this as yet another example of the uses of xtopdf, my Python toolkit for PDF creation, as well as for the D part, which is fun (pun intended :), and because those D features are powerful.

The D program is derived, with some modifications, from a program in this post by Gary Willoughby,

More hidden treasure in the D standard library .

That program demonstrates, among other things, the pipe feature of D from the std.functional module.

First, the D program, student_grades.d:

/*
student_grades.d
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Adapts code from:
http://nomad.so/2015/08/more-hidden-treasure-in-the-d-standard-library/
*/

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;
import std.functional;

// Set up the functional pipeline by composing some functions.
alias sumString = pipe!(split, map!(to!(int)), sum);

void main(string[] args)
{
    // Data to be transformed:
    // Each string has the student name followed by 
    // their grade in 5 subjects.
    auto student_grades_list = 
    [
        "A 1 2 3 4 5",
        "B 2 3 4 5 6",
        "C 3 4 5 6 7",
        "D 4 5 6 7 8",
        "E 5 6 7 8 9",
    ];

    // Transform the data for each student.
    foreach(student_grades; student_grades_list) {
        auto student = student_grades[0]; // name
        auto total = sumString(student_grades[2..$]); // grade total
        writeln("Grade total for student ", student, ": ", total);
    }
}

The initial data (which the D program transforms) is hard-coded, but that can easily be changed to read it from a text file, for instance. The program uses pipe() to compose [1] the functions split, map (to int), and sum (some of which are generic / template functions).

So, when it is run, each string (student_grades) of the input array student_grades_list is split (into an array of smaller strings, by space as the delimiter); then each string in the array (except for the first, which is the name), is mapped (converted) to integer; then all the integers are summed up to get the student's grade total; finally these names and totals are written to standard output. That becomes the input to the next stage of the pipeline, the Python program, which does the conversion to PDF.

Build the D program with:

dmd -o- student_grades.d

which gives us the executable, student_grades(.exe).

The Python part of the pipeline is StdinToPDF.py, one of the apps in the xtopdf toolkit, which is designed to be used in pipelines such as the above - basically, any Unix, Windows or Mac OS X pipeline, that generates text as its final output, can be further terminated by StdinToPDF, resulting in conversion of that text to PDF. It is a small Python app written using the core class of the xtopdf library, PDFWriter. Here is the original post about StdinToPDF:

[xtopdf] PDFWriter can create PDF from standard input

Here is the pipeline:

$ student_grades | python StdinToPDF.py sg.pdf

Below is a cropped screenshot of the generated output, sg.pdf, as seen in Foxit PDF Reader.

[1] Speaking of functional composition, you may like to check out this earlier post by me:

fmap(), "inverse" of Python map() function

It's about a different way of composing functions (for certain cases). It also has an interesting comment exchange between me and a reader, who showed both how to do it in Scala, and another way to do it in Python. Also, there is another way to compose functions in D, using a function called compose; it works like pipe, but the composed functions are written in the reverse order. It's in the same D module as pipe, std.functional.

Finally (before I pipe down - for today :), check out this HN comment by me (on another topic), in which I mention and link to multiple other ways of doing pipe-like stuff in Python:

Comment on Streem – a new programming language from Matz

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

My Python posts Subscribe to my blog by email

Share |

Sunday, July 24, 2016

Control break report to PDF with xtopdf

By Vasudev Ram

Hi readers,

Control break reports are very common in data processing, from the earliest days of computing until today. This is because they are a fundamental kind of report, the need for which is ubiquitous across many kinds of organizations.

Here is an example program that generates a control break report and writes it to PDF, using xtopdf, my Python toolkit for PDF creation.

The program is named ControlBreakToPDF.py. It uses xtopdf to generate the PDF output, and the groupby function from the itertools module to handle the control break logic easily.

I've written multiple control-break report generation programs before, including implementing the logic manually, and it can get a little fiddly to get everything just right, particularly when there is more than one level of nesting (i.e. no off-by-one errors, etc.); you have to check for various conditions, set flags, etc.

So it's nice to have Python's itertools.groupby functionality handle it, at least for basic cases. Note that the data needs to be sorted on the grouping key, in order for groupby to work. Here is the code for ControlBreakToPDF.py:

from __future__ import print_function

# ControlBreakToPDF.py
# A program to show how to write simple control break reports
# and send the output to PDF, using itertools.groupby and xtopdf.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# http://jugad2.blogspot.com
# https://gumroad.com/vasudevram

from itertools import groupby
from PDFWriter import PDFWriter

# I hard-code the data here to make the example shorter.
# More commonly, it would be fetched at run-time from a 
# database query or CSV file or similar source.

data = \
[
    ['North', 'Desktop #1', 1000],
    ['South', 'Desktop #3', 1100],
    ['North', 'Laptop #7', 1200],
    ['South', 'Keyboard #4', 200],
    ['North', 'Mouse #2', 50],
    ['East', 'Tablet #5', 200],
    ['West', 'Hard disk #8', 500],
    ['West', 'CD-ROM #6', 150],
    ['South', 'DVD Drive', 150],
    ['East', 'Offline UPS', 250],
]

pw = PDFWriter('SalesReport.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Sales by Region')
pw.setFooter('Using itertools.groupby and xtopdf')

# Convenience function to both print to screen and write to PDF.
def print_and_write(s, pw):
    print(s)
    pw.writeLine(s)

# Set column headers.
headers = ['Region', 'Item', 'Sale Value']
# Set column widths.
widths = [ 10, 15, 10 ]
# Build header string for report.
header_str = ''.join([hdr.center(widths[ind]) \
    for ind, hdr in enumerate(headers)])
print_and_write(header_str, pw)

# Function to base the sorting and grouping on.
def key_func(rec):
    return rec[0]

data.sort(key=key_func)

for region, group in groupby(data, key=key_func):
    print_and_write('', pw)
    # Write group header, i.e. region name.
    print_and_write(region.center(widths[0]), pw)
    # Write group's rows, i.e. sales data for the region.
    for row in group:
        # Build formatted row string.
        row_str = ''.join(str(col).rjust(widths[ind + 1]) \
            for ind, col in enumerate(row[1:]))
        print_and_write(' ' * widths[0] + row_str, pw)
pw.close()

Running it gives this output on the screen:

$ python ControlBreakToPDF.py
  Region        Item     Sale Value

   East
                Tablet #5       200
              Offline UPS       250

  North
               Desktop #1      1000
                Laptop #7      1200
                 Mouse #2        50

  South
               Desktop #3      1100
              Keyboard #4       200
                DVD Drive       150

   West
             Hard disk #8       500
                CD-ROM #6       150

$

And this is a screenshot of the PDF output, viewed in Foxit PDF Reader:

So the itertools.groupby function basically provides roughly the same sort of functionality that SQL's GROUP BY clause provides (of course, when included in a complete SELECT statement). The difference is that with Python's groupby, you do the grouping and related processing in your program code, on data which is in memory, while if using SQL via a client-server RDBMS from your program, the grouping and processing will happen on the database server and only the aggregate results will be sent to your program to process further. Both methods can have pros and cons, depending on the needs of the application.

In my next post about Python, I'll use this program as one vehicle to demonstrate some uses of randomness in testing, continuing the series titled "The many uses of randomness", the earlier two parts of which are here and here.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Follow me on Gumroad to be notified about my new products:

Share |

Friday, February 26, 2016

PrintFriendly.com, site to print any web page well

- Vasudev Ram - Online Python training and programming

PrintFriendly.com is a web site that claims to lets you make any web page print friendly.

From their site:

[ PrintFriendly cleans and formats web pages for perfect print experience. PrintFriendly removes Ads, Navigation and web page junk, so you save paper and ink when you print. It's free and easy to use. Perfect to use at home, the office, or whenever you need to print a web page. ]

I tried out PrintFriendly on the About page of my blog, jugad2. It seems to have worked well.

Here is a screenshot of the first part of the resulting page preview:

Here is a screenshot of roughly the middle part of that page:

And here is a screenshot of the bottom of the page, with the image of my mascot showing.

All three parts seem to have come out well.

Interestingly, the site also has a feature by which, in the print preview, if you highlight certain parts with your mouse, it shows a small popup that lets you delete the selected part - before printing, I guess.

So PrintFriendly is eco-friendly.

The site reminds me slightly of my selpg Linux utility, which lets you print selected pages of a text file to PDF (with the help of xtopdf, which is also by me.

Signup to hear about new products and services I create.

Signup to hear about new products and services I create.

Share |

Thursday, January 7, 2016

Generate PDF from a Python-controlled Unix pipeline

By Vasudev Ram

This post is about a new xtopdf app I wrote, called PopenToPDF.py.

(xtopdf is my PDF generation toolkit, written in Python. The toolkit consists of a core library and multiple applications built using it.)

This program, PopenToPDF, shows how to use xtopdf to generate PDF output from any Python-controlled Unix pipeline. It uses the subprocess Python module.

I had written a few posts earlier about the uses of StdinToPDF.py, another xtopdf app [1]

(There are many kinds of pipeline"; it is a powerful concept.)

StdinToPDF is an application of xtopdf that can be used at the end of a Unix or Windows pipeline, to publish the text output of the pipeline to PDF.

[1] Here are some of those posts about StdinToPDF:

a) PDFWriter can create PDF from standard input

b) Print selected text pages to PDF with Python, selpg and xtopdf on Linux

c) Generate Windows Task List to PDF with xtopdf

PopenToPDF has the same general goal as StdinToPDF (to allow creation of a pipeline whose final output is PDF), but works somewhat differently.

Instead of just being used passively (like StdinToPDF) as the last component in a pipeline run from the command line, PopenToPDf is a Python program that itself sets up and runs a pipeline (of all the preceding commands, excepting itself), using subprocess.Popen, and then reads the output of that pipeline, programmatically, and converts the text it reads to PDF. So it is a different approach that may allow for other possibilities for customization.

For the example, I created an input text file of 1000 lines, via a small one-off script. The file is called 1000-lines.txt.

The pipeline (created by PopenToPDF) runs "nl -ba" to add sequential line numbers to each line of the input file. (nl is a Unix command to number lines.) Then the output is passed to my selpg utility (a command-line utility in C), which is a filter that reads its input and selects only a specified range of pages to pass on to the output. (Full details of the selpg utility, including explanation of its logic, source code, and the build steps, are at the URL in the previous sentence, or at links accessible from that URL.)

(This page on sites.harvard.edu is a good resource for Linux command line utility development, and also references my IBM dW article about selpg.)

PopenToPDF sets up the above pipeline (nl -ba piped to selpg), and then reads all the lines from it, adds its own line numbers to the input, and writes it all to a PDF file.

Thus we end up with two sets of line numbers prefixed to each line (in the PDF): the original line numbers added by the nl command, which represents the position of each line extracted from the original file, and the serial numbers (starting from 0) for the subset of lines that PopenToPDF sees.

I did this so that we could verify that the pipeline is extracting the right lines that we specified, by looking at the relative and absolute line numbers in the output (screenshots below).

Here is a screenshot of the first page of the PDF output:

And here is a screenshot of the last page, page 4, of the PDF output:

You can see that the last relative line number (added by PopenToPDF, in the extreme left number column) is 215, and the first was 0 (on the first page), so the number of lines extracted by selpg is 216, which corresponds to what we asked selpg for by specifying a start page of 3 (-s3) and an end page of 5 (-e5), since there are 72 lines per page (the default) and 72 * (5 -3 + 1) = 72 * 3 = 216. You can do a similar calculation for the absolute line numbers shown, to verify that we have extracted not only the right number of pages, but also the right pages.

So this approach (using Popen) can be used to run a pipeline under control of a Python program, read the output of the pipeline, and do some further processing on it. Obviously, it is a generic approach, not limited to producing PDF. It could be used for any other purpose where you want to run a pipeline under program control and do something with the output of the pipeline, in your own Python code.

I'll end with a few points about related topics:

This program is actually an example of a category of data processing operations commonly used in organizations, which can be broadly described as starting with some data source, and passing it through a series of transformations until we have the final output we want.

Often, but not always, the input for these transformations is downloaded from some database or application (of the organization), and/or the output is uploaded to another database or application (also of the organization).

In some of these cases, the process is called ETL, meaning Extract, Transform, Load. This operation is also related to IT system integration.

In general, these tasks can consist of a combination of the use of existing components (programs) and purpose-written code in a compiled or interpreted language. The operation can also consist of a combination of manual and automated steps.

When there is enough uniformity in the data and needed processing rules across runs, using more automation leads to more time and cost savings. Some amount of variation in the data or rules can be handled by parameterization of input and output filenames, database connections, table names, use of conditional logic, etc.

Finally, in the process of writing this program and post (across a couple of sessions), I came across mentions of microservices in tech forums. Microservices have been in the news for a while. So I looked up definitions of microservices and realized that they are in some ways similar to Unix pipelines and the Unix philosophy of creating small tools that do one thing well, and then combining them to achieve bigger tasks.

If you're interested in pipes and Python and their intersection, also check out this HN comment by me, which lists multiple other Python pipe-like tools, including one (pipe_controller) by me:

Yes, pyp is interesting. So are some other roughly similar Python tools

- Enjoy.

- Vasudev Ram - Online Python training and programming

- Vasudev Ram - Online Python training and programming

Share |

Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram

XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):

Here is the code for XLSXtoPDF.py:

# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()

And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

Signup to hear about new products and services I create.

Share |

Wednesday, February 25, 2015

Publish SQLite data to PDF using named tuples

- Vasudev Ram - Online Python training and programming

Some time ago I had written this post:

Publishing SQLite data to PDF is easy with xtopdf.

It showed how to get data from an SQLite (Wikipedia) database and write it to PDF, using xtopdf, my open source PDF creation library for Python.

Today I was browsing the Python standard library docs, and so thought of modifying that program to use the namedtuple data type from the collections module of Python, which is described as implementing "High-performance container datatypes". The collections module was introduced in Python 2.4.
Here is a modified version of that program, SQLiteToPDF.py, called SQLiteToPDFWithNamedTuples.py, that uses named tuples:

# SQLiteToPDFWithNamedTuples.py
# Author: Vasudev Ram - http://www.dancingbison.com
# SQLiteToPDFWithNamedTuples.py is a program to demonstrate how to read 
# SQLite database data and convert it to PDF. It uses the Python
# data structure called namedtuple from the collections module of 
# the Python standard library.

from __future__ import print_function
import sys
from collections import namedtuple
import sqlite3
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
    print(strng)
    pw.writeLine(strng)

try:

    # Create the stocks database.
    conn = sqlite3.connect('stocks.db')
    # Get a cursor to it.
    curs = conn.cursor()

    # Create the stocks table.
    curs.execute('''DROP TABLE IF EXISTS stocks''')
    curs.execute('''CREATE TABLE stocks
                 (date text, trans text, symbol text, qty real, price real)''')

    # Insert a few rows of data into the stocks table.
    curs.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 25.1)")
    curs.execute("INSERT INTO stocks VALUES ('2007-02-06', 'SELL', 'ORCL', 200, 35.2)")
    curs.execute("INSERT INTO stocks VALUES ('2008-03-07', 'HOLD', 'IBM', 300, 45.3)")
    conn.commit()

    # Create a namedtuple to represent stock rows.
    StockRecord = namedtuple('StockRecord', 'date, trans, symbol, qty, price')

    # Run the query to get the stocks data.
    curs.execute("SELECT date, trans, symbol, qty, price FROM stocks")

    # Create a PDFWriter and set some of its fields.
    pw = PDFWriter("stocks.pdf")
    pw.setFont("Courier", 12)
    pw.setHeader("SQLite data to PDF with named tuples")
    pw.setFooter("Generated by xtopdf - https://bitbucket.org/vasudevram/xtopdf")

    # Write header info.
    hdr_flds = [ str(hdr_fld).rjust(10) + " " for hdr_fld in StockRecord._fields ]
    hdr_fld_str = ''.join(hdr_flds)
    print_and_write(pw, '=' * len(hdr_fld_str))
    print_and_write(pw, hdr_fld_str)
    print_and_write(pw, '-' * len(hdr_fld_str))

    # Now loop over the fetched data and write it to PDF.
    # Map the StockRecord namedtuple's _make class method
    # (that creates a new instance) to all the rows fetched.
    for stock in map(StockRecord._make, curs.fetchall()):
        row = [ str(col).rjust(10) + " " for col in (stock.date, \
        stock.trans, stock.symbol, stock.qty, stock.price) ]
        # Above line can instead be written more simply as:
        # row = [ str(col).rjust(10) + " " for col in stock ]
        row_str = ''.join(row)
        print_and_write(pw, row_str)

    print_and_write(pw, '=' * len(hdr_fld_str))

except Exception as e:
    print("ERROR: Caught exception: " + e.message)
    sys.exit(1)

finally:
    pw.close()
    conn.close()

This time I've imported print_function so that I can use print as a function instead of as a statement.

Here's a screenshot of the PDF output in Foxit PDF Reader:

Dancing Bison Enterprises

Signup to hear about new products or services from me.

Share |

Sunday, February 22, 2015

Excel to PDF with xlwings and xtopdf

Signup to hear about new products or services from me.

Excel to PDF with xlwings and xtopdf - how many x in that? :)

I came across xlwings recently via the Net.

xlwings is by Zoomer Analytics, a startup based in Zürich, Switzerland, by a team with background in financial institutions.

Excerpt from the xlwings documentation:

[ xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa:

Interact with Excel from Python using a syntax that is close to VBA yet Pythonic.

Replace your VBA macros with Python code and still pass around your workbooks as easily as before.

xlwings fully supports NumPy arrays and Pandas DataFrames. It works with Microsoft Excel on Windows and Mac. ]

I checked out the xlwings quickstart.

Then did a quick test of using xlwings with xtopdf, my toolkit for PDF creation, to create a simple Excel spreadsheet, then read back its contents, and convert that to PDF.

Here is the code:

"""
xlwingsToPDF.py
A demo program to show how to convert the text extracted from Excel 
content, using xlwings, to PDF. It uses the xlwings library, to create 
and read the Excel input, and the xtopdf library to write the PDF output.
Author: Vasudev Ram - http://www.dancingbison.com
Copyright 2015 Vasudev Ram
"""

import sys
from xlwings import Workbook, Sheet, Range, Chart
from PDFWriter import PDFWriter

# Create a connection with a new workbook.
wb = Workbook()

# Create the Excel data.
# Column 1.
Range('A1').value = 'Foo 1'
Range('A2').value = 'Foo 2'
Range('A3').value = 'Foo 3'
# Column 2.
Range('B1').value = 'Bar 1'
Range('B2').value = 'Bar 2'
Range('B3').value = 'Bar 3'

pw = PDFWriter("xlwingsTo.pdf")
pw.setFont("Courier", 10)
pw.setHeader("Testing Excel conversion to PDF with xlwings and xtopdf")
pw.setFooter("xlwings: http://xlwings.org --- xtopdf: http://slid.es/vasudevram/xtopdf")

for row in Range('A1..B3').value:
    s = ''
    for col in row:
        s += col + ' | '
    pw.writeLine(s)

pw.close()

I ran it with this command:

py xlwingsToPDF.py

and here is a screenshot of the output PDF file:

Note: The xlwings library can be installed with:

pip install xlwings

But a prerequisite for it, pywin32, did not install automatically. pywin32 is a very useful and powerful Windows API wrapper library for Python, by Mark Hammond. I've used it a few times earlier, in earlier Python versions than Python 2.7.8, which I currently am using. I usually installed it directly in those earlier versions. This time, though it was a dependency for xlwings, it did not get installed automatically, and the above Python program gave a runtime error. I had to manually install pywin32 before the program could work.

- Enjoy.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Wednesday, January 28, 2015

HTML text to PDF with Beautiful Soup and xtopdf

- Vasudev Ram - Python training and programming - Dancing Bison Enterprises

Recently, I thought of getting the text from HTML documents and putting that text to PDF. So I did it :)

Here's how:

"""
HTMLTextToPDF.py
A demo program to show how to convert the text extracted from HTML 
content, to PDF. It uses the Beautiful Soup library, v4, to 
parse the HTML, and the xtopdf library to generate the PDF output.
Beautiful Soup is at: http://www.crummy.com/software/BeautifulSoup/
xtopdf is at: https://bitbucket.org/vasudevram/xtopdf
Guide to using and installing xtopdf: http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html
Author: Vasudev Ram - http://www.dancingbison.com
Copyright 2015 Vasudev Ram
"""

import sys
from bs4 import BeautifulSoup
from PDFWriter import PDFWriter

def usage():
    sys.stderr.write("Usage: python " + sys.argv[0] + " html_file pdf_file\n")
    sys.stderr.write("which will extract only the text from html_file and\n")
    sys.stderr.write("write it to pdf_file\n")

def main():

    # Create some HTML for testing conversion of its text to PDF.
    html_doc = """
    <html>
        <head>
            <title>
            Test file for HTMLTextToPDF
            </title>
        </head>
        <body>
        This is text within the body element but outside any paragraph.
        <p>
        This is a paragraph of text. Hey there, how do you do?
        The quick red fox jumped over the slow blue cow.
        </p>
        <p>
        This is another paragraph of text.
        Don't mind what it contains.
        What is mind? Not matter.
        What is matter? Never mind.
        </p>
        This is also text within the body element but not within any paragraph.
        </body>
    </html>
    """

    pw = PDFWriter("HTMLTextTo.pdf")
    pw.setFont("Courier", 10)
    pw.setHeader("Conversion of HTML text to PDF")
    pw.setFooter("Generated by xtopdf: http://slid.es/vasudevram/xtopdf")
 
    # Use method chaining this time.
    for line in BeautifulSoup(html_doc).get_text().split("\n"):
        pw.writeLine(line)
    pw.savePage()
    pw.close()

if __name__ == '__main__':
    main()

The program uses the Beautiful Soup library for parsing and extracting information from HTML, and xtopdf, my Python library for PDF generation.
Run it with:

python HTMLTextToPDF.py

and the output will be in the file HTMLTextTo.pdf.
Screenshot below:

Read more of my posts about Python or read posts about xtopdf (latter is subset of former)

Signup to hear about my new software products or services.

Share |

Friday, January 23, 2015

PrettyTable to PDF is pretty easy with xtopdf

Signup to be informed about my new products or services.

"PrettyTable to PDF is pretty easy with xtopdf."

How's that for some alliteration? :)

PrettyTable is a Python library to help you generate nice tables with ASCII characters as the borders, plus alignment of text within columns, headings, padding, etc.

Excerpt from the site:

[ PrettyTable is a simple Python library designed to make it quick and easy to represent tabular data in visually appealing ASCII tables.
...
PrettyTable lets you control many aspects of the table, like the width of the column padding, the alignment of text within columns, which characters are used to draw the table border, whether you even want a border, and much more. You can control which subsets of the columns and rows are printed, and you can sort the rows by the value of a particular column.

PrettyTable can also generate HTML code with the data in a <table> structure. ]

I came across PrettyTable via this blog post:

11 Python Libraries You Might Not Know.

Then I thought of using it with my PDF creation toolkit, to generate such ASCII tables, but as PDF. Here's a program, PrettyTableToPDF.py, that shows how to do that:

"""
PrettyTableToPDF.py
A demo program to show how to convert the output generated 
by the PrettyTable library, to PDF, using the xtopdf toolkit 
for PDF creation from other formats.
Author: Vasudev Ram - http://www.dancingbison.com
xtopdf is at: http://slides.com/vasudevram/xtopdf

Copyright 2015 Vasudev Ram
"""

from prettytable import PrettyTable
from PDFWriter import PDFWriter

pt = PrettyTable(["City name", "Area", "Population", "Annual Rainfall"])
pt.align["City name"] = "l" # Left align city names
pt.padding_width = 1 # One space between column edges and contents (default)
pt.add_row(["Adelaide",1295, 1158259, 600.5])
pt.add_row(["Brisbane",5905, 1857594, 1146.4])
pt.add_row(["Darwin", 112, 120900, 1714.7])
pt.add_row(["Hobart", 1357, 205556, 619.5])
pt.add_row(["Sydney", 2058, 4336374, 1214.8])
pt.add_row(["Melbourne", 1566, 3806092, 646.9])
pt.add_row(["Perth", 5386, 1554769, 869.4])
lines = pt.get_string()

pw = PDFWriter('Australia-Rainfall.pdf')
pw.setFont('Courier', 12)
pw.setHeader('Demo of PrettyTable to PDF')
pw.setFooter('Demo of PrettyTable to PDF')
for line in lines.split('\n'):
    pw.writeLine(line)
pw.close()

You can run the program with:

$ python PrettyTableToPDF.py

And here is a screenshot of the output PDF in Foxit PDF Reader:

- Enjoy.

--- Posts about Python --- Posts about xtopdf ---

- Vasudev Ram - Dancing Bison Enterprises - Python programming and training

Share |

Thursday, January 15, 2015

Publish databases to PDF with PyDAL and xtopdf

- Vasudev Ram - Python programming and training

Some days ago, I had blogged about pyDAL, a pure Python Database Abstraction Layer.

Today I thought of writing a program to publish database data to PDF, using PyDAL and xtopdf, my open source Python library for PDF creation from other file formats.

(Here is a good online overview about xtopdf, for those new to it.)

So here is the code for PyDALtoPDF.py:

"""
Author: Vasudev Ram
Copyright 2014 Vasudev Ram - www.dancingbison.com
This program is a demo of how to use the PyDAL and xtopdf Python libraries 
together to publish database data to PDF.
PyDAL is at: https://github.com/web2py/pydal/blob/master/README.md
xtopdf is at: https://bitbucket.org/vasudevram/xtopdf
and info about xtopdf is at: http://slides.com/vasudevram/xtopdf or 
at: http://slid.es/vasudevram/xtopdf
"""

# imports
from pydal import DAL, Field
from PDFWriter import PDFWriter

SEP = 60

# create the database
db = DAL('sqlite://house_depot.db')

# define the table
db.define_table('furniture', \
    Field('id'), Field('name'), Field('quantity'), Field('unit_price')
)

# insert rows into table
items = ( \
    (1, 'chair', 40, 50),
    (2, 'table', 10, 300),
    (3, 'cupboard', 20, 200),
    (4, 'bed', 30, 400)
)
for item in items:
    db.furniture.insert(id=item[0], name=item[1], quantity=item[2], unit_price=item[3])

# define the query
query = db.furniture
# the above line shows an interesting property of PyDAL; it seems to 
# have some flexibility in how queries can be defined; in this case,
# just saying db.table_name tells it to fetch all the rows 
# from table_name; there are other variations possible; I have not 
# checked out all the options, but the ones I have seem somewhat 
# intuitive.

# run the query
rows = db(query).select()

# setup the PDFWriter
pw = PDFWriter('furniture.pdf')
pw.setFont('Courier', 10)
pw.setHeader('     House Depot Stock Report - Furniture Division     '.center(60))
pw.setFooter('Generated by xtopdf: http://google.com/search?q=xtopdf')

pw.writeLine('=' * SEP)

field_widths = (5, 10, 10, 12, 10)

# print the header row
pw.writeLine(''.join(header_field.center(field_widths[idx]) for idx, header_field in enumerate(('#', 'Name', 'Quantity', 'Unit price', 'Price'))))

pw.writeLine('-' * SEP)

# print the data rows
for row in rows:
    # methinks the writeLine argument gets a little long here ...
    # the first version of the program was taller but thinner :)
    pw.writeLine(''.join(str(data_field).center(field_widths[idx]) for idx, data_field in enumerate((row['id'], row['name'], row['quantity'], row['unit_price'], int(row['quantity']) * int(row['unit_price'])))))

pw.writeLine('=' * SEP)
pw.close()

I ran it (on Windows) with:

$ py PyDALtoPDF.py 2>NUL

Here is a screenshot of the output in Foxit PDF Reader:

- Enjoy.

--- Posts about Python --- Posts about xtopdf ---

Signup to hear about new products or services from me.

Share |

Friday, January 9, 2015

Convert TSV (Tab Separated Values) to PDF with xtopdf