Showing posts with label data-conversion. Show all posts
Showing posts with label data-conversion. Show all posts

Saturday, December 17, 2016

[xtopdf] Publish DSV data (Delimiter-Separated Values) to PDF

By Vasudev Ram



Here is another in my series of posts about xtopdf, my Python toolkit for PDF creation from other data/file formats. This one is for conversion of DSV data to PDF.

DSV stands for Delimiter-Separated Values, and is a superset or generalization of formats like CSV (Comma-Separated Values) and TSV (Tab-Separated Values). These formats are commonly used in general computing (business, scientific, etc.) and also in data science.

This program, DSVToPDF.py, builds upon the code in the read_dsv.py program in this recent post:

Processing DSV data (Delimiter-Separated Values) with Python

It reads DSV content either from filenames given as command-line arguments, or from standard input, and converts each such input into a PDF file with the same name as the DSV file, but with added extension ".pdf". In the case of standard input, since there is no DSV filename to use as the base, the PDF file is named dsv_output.pdf.

Here is the code for DSVToPDF.py:
from __future__ import print_function

"""
DSVToPDF.py
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
Purpose: Show how to publish DSV data (Delimiter-Separated Values) 
to PDF, using the xtopdf toolkit.
Requires:
 - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
 - xtopdf: https://bitbucket.org/vasudevram/xtopdf
First install ReportLab, then install xtopdf, using instructions here:
http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html
The DSV data can be read from either files or standard input.
The delimiter character is configurable by the user and can
be specified as either a character or its ASCII code.
References:
DSV format: https://en.wikipedia.org/wiki/Delimiter-separated_values 
TAOUP (The Art Of Unix Programming): Data File Metaformats:
http://www.catb.org/esr/writings/taoup/html/ch05s02.html
ASCII table: http://www.asciitable.com/
"""

import sys
import string
from PDFWriter import PDFWriter

def err_write(message):
    sys.stderr.write(message)

def error_exit(message):
    err_write(message)
    sys.exit(1)

def usage(argv, verbose=False):
    usage1 = \
        "{}: publish DSV (Delimiter-Separated-Values) data to PDF.\n".format(argv[0])
    usage2 = "Usage: python" + \
        " {} [ -c delim_char | -n delim_code ] [ dsv_file ] ...\n".format(argv[0])
    usage3 = [
        "where one of either the -c or -n option must be given,\n",  
        "delim_char is a single ASCII delimiter character, and\n", 
        "delim_code is a delimiter character's ASCII code.\n", 
        "Text lines will be read from specified DSV file(s) or\n", 
        "from standard input, split on the specified delimiter\n", 
        "specified by either the -c or -n option, processed, and\n", 
        "written, formatted, to PDF files with the name dsv_file.pdf.\n", 
    ]
    usage4 = "Use the -h or --help option for a more detailed help message.\n"
    err_write(usage1)
    err_write(usage2)
    if verbose:
        '''
        for line in usage3:
            err_write(line)
        '''
        err_write(''.join(usage3))
    if not verbose:
        err_write(usage4)

def str_to_int(s):
    try:
        return int(s)
    except ValueError as ve:
        error_exit(repr(ve))

def valid_delimiter(delim_code):
    return not invalid_delimiter(delim_code)

def invalid_delimiter(delim_code):
    # Non-ASCII codes not allowed, i.e. codes outside
    # the range 0 to 255.
    if delim_code < 0 or delim_code > 255:
        return True
    # Also, don't allow some specific ASCII codes;
    # add more, if it turns out they are needed.
    if delim_code in (10, 13):
        return True
    return False

def dsv_to_pdf(dsv_fil, delim_char, pdf_filename):
    with PDFWriter(pdf_filename) as pw:
        pw.setFont("Courier", 12)
        pw.setHeader(pdf_filename[:-4] + " => " + pdf_filename)
        pw.setFooter("Generated by xtopdf: https://google.com/search?q=xtopdf")
        for idx, lin in enumerate(dsv_fil):
            fields = lin.split(delim_char)
            assert len(fields) > 0
            # Knock off the newline at the end of the last field,
            # since it is the line terminator, not part of the field.
            if fields[-1][-1] == '\n':
                fields[-1] = fields[-1][:-1]
            # Treat a blank line as a line with one field,
            # an empty string (that is what split returns).
            pw.writeLine(' - '.join(fields))

def main():
    # Get and check validity of arguments.
    sa = sys.argv
    lsa = len(sa)
    if lsa == 1:
        usage(sa)
        sys.exit(0)
    elif lsa == 2:
        # Allow the help option with any letter case.
        if sa[1].lower() in ("-h", "--help"):
            usage(sa, verbose=True)
            sys.exit(0)
        else:
            usage(sa)
            sys.exit(0)

    # If we reach here, lsa is >= 3.
    # Check for valid mandatory options (sic).
    if not sa[1] in ("-c", "-n"):
        usage(sa, verbose=True)
        sys.exit(0)

    # If -c option given ...
    if sa[1] == "-c":
        # If next token is not a single character ...
        if len(sa[2]) != 1:
            error_exit(
            "{}: Error: -c option needs a single character after it.".format(sa[0]))
        if not sa[2] in string.printable:
            error_exit(
            "{}: Error: -c option needs a printable ASCII character after it.".format(\
            sa[0]))
        delim_char = sa[2]
    # else if -n option given ...
    elif sa[1] == "-n":
        delim_code = str_to_int(sa[2])
        if invalid_delimiter(delim_code):
            error_exit(
            "{}: Error: invalid delimiter code {} given for -n option.".format(\
            sa[0], delim_code))
        delim_char = chr(delim_code)
    else:
        # Checking for what should not happen ... a bit of defensive programming here.
        error_exit("{}: Program error: neither -c nor -n option given.".format(sa[0]))

    try:
        # If no filenames given, do sys.stdin to PDF ...
        if lsa == 3:
            print("Converting content of standard input to PDF.")
            dsv_fil = sys.stdin
            dsv_to_pdf(dsv_fil, delim_char, "dsv_output.pdf")
            dsv_fil.close()
            print("Output is in dsv_output.pdf")
        # else (filenames given), convert them to PDFs ...
        else:
            for dsv_filename in sa[3:]:
                pdf_filename = dsv_filename + ".pdf"
                print("Converting file {} to PDF.", dsv_filename)
                dsv_fil = open(dsv_filename, 'r')
                dsv_to_pdf(dsv_fil, delim_char, pdf_filename)
                dsv_fil.close()
                print("Output is in {}".format(pdf_filename))
    except IOError as ioe:
        error_exit("{}: Error: {}".format(sa[0], repr(ioe)))
        
if __name__ == '__main__':
    main()
Note the commented and uncommented lines in the "if verbose:" clause in the usage() function. The latter is shorter, and, methinks, a tad more Pythonic.
I did a sample run with the data shown in the image at the top of this post, which uses the pipe character (|) as the delimiter between fields (124 is the ASCII code for the pipe character):
$ python DSVToPDF.py -n 124 file4.dsv
And here is a cropped screenshot of the PDF output as viewed in Foxit PDF Reader:


- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

Managed WordPress Hosting by FlyWheel



Tuesday, December 6, 2016

[xtopdf] Wildcard text files to PDF with xtopdf and glob

By Vasudev Ram



First joker card image attribution

This is another in my series of applications that use xtopdf (source), my Python toolkit for PDF creation from other formats.

[ Here's a good overview of xtopdf, its uses, supported platforms and formats, etc. ]

I called this app WildcardTextToPDFpy.. It lets you specify a wildcard for text files, and then converts each of the text files matching the wildcard (like grades*2016.txt or monthly*sales.txt) [1], into corresponding PDF files - with the same names but with '.pdf' appended.

[1] For example, the wildcard grades*2016.txt could match grades-math-2016.txt and grades-bio-2016.txt (think a school or college), while monthly*sales.txt might match monthly-car-sales.txt and monthly-bike-sales.txt (think a vehicle dealership), so a PDF file will be generated for each text file matching the given wildcard.

The program uses the iglob function from the glob module in Python's standard library, similar to how this recent other post:

Simple directory lister with multiple wildcard arguments

used the glob function. The difference is that glob returns a list, while iglob returns a generator, so it will return the matching filenames lazily, on demand, as shown here:
$ python
>>> g1 = glob.glob('text*.txt')
>>> g1
['text1.txt', 'text2.txt', 'text3.txt']
>>>>>> g2 = glob.iglob('text*.txt')
>>> g2
<generator object iglob at 0x027C2850>
>>> next(g2)
'text1.txt'
>>> for f in g2:
...     print f
...
text2.txt
text3.txt

Here is the code for WildcardTextToPDF.py:
from __future__ import print_function

# WildcardTextToPDF.py
# Convert the text files specified by a filename wildcard,
# like '*.txt' or 'foo*bar*baz.txt', to PDF files.
# Each text file's content goes to a separate PDF file, with the 
# PDF file name being the full text file name (including the 
# '.txt' part), with '.pdf' appended.
# Requires:
# - xtopdf: https://bitbucket.org/vasudevram/xtopdf
# - ReportLab: https://www.reportlab.com/ftp/reportlab-1.21.1.tar.gz
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram
# Product store: https://gumroad.com/vasudevram
# Web site: https://vasudevram.github.io
# Blog: http://jugad2.blogspot.com

import sys
import os
import glob
from PDFWriter import PDFWriter

def usage(argv):
    sys.stderr.write("Usage: python {} txt_filename_pattern\n".format(argv[0]))
    sys.stderr.write("E.g. python {} foo*.txt\n".format(argv[0]))

def text_to_pdf(txt_filename):
    pw = PDFWriter(txt_filename + '.pdf')
    pw.setFont('Courier', 12)
    pw.setHeader('{} converted to PDF'.format(txt_filename))
    pw.setFooter('PDF conversion by xtopdf: https://google.com/search?q=xtopdf')

    with open(txt_filename) as txt_fil:
        for line in txt_fil:
            pw.writeLine(line.strip('\n'))
        pw.savePage()

def main():
    if len(sys.argv) != 2:
        usage(sys.argv)
        sys.exit(0)

    try:
        for filename in glob.glob(sys.argv[1]):
            print("Converting {} to {}".format(filename, filename + '.pdf'))
            text_to_pdf(filename)
    except Exception as e:
        print("Caught Exception: type: {}, message: {}".format(\
            e.__class__, str(e)))

if __name__ == '__main__':
    main()
And here are the relevant files before and after running the program, with the program's output in between:
$ dir text*.txt/b
text1.txt
text2.txt
text3.txt

$ python WildcardTextToPDF2.py text*.txt
Converting text1.txt to text1.txt.pdf
Converting text2.txt to text2.txt.pdf
Converting text3.txt to text3.txt.pdf

$ dir text?.txt*/od/b
text1.txt
text2.txt
text3.txt
text1.txt.pdf
text2.txt.pdf
text3.txt.pdf
Finally, here's a cropped screenshot of the third output file, text3.txt.pdf, in Foxit PDF Reader (a lightweight PDF reader that I use):


Also look up:

[xtopdf] Batch convert text files to PDF (with xtopdf and fileinput)

for another variation on the approach to converting text files to PDF.

Speaking of generators, also check out this other post about them:

Python generators are pluggable

The image at the top of the post is of the earliest Joker card by Samuel Hart c. 1863, according to Wikipedia:

Joker (playing card)

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

FlyWheel - Managed WordPress Hosting



Friday, November 25, 2016

Processing DSV data (Delimiter-Separated Values) with Python

By Vasudev Ram



I needed to process some DSV files recently. Here is a Python program I wrote for it, with a few changes over the original. E.g. I do not show the processing of the data here; I only read and print it. Also, I support two different command-line options (and ways) to specify the delimiter character.

DSV (Delimiter-separated values) is a common tabular text data format, with one record per line, and some number of fields per record, where the fields are separated or delimited by some specific character. Some common delimiter characters used in DSV files are tab (which makes them TSV files, Tab-Separated Values, common on Unix), comma (CSV files, Comma-Separated Values, a common spreadsheet and database import-export format), the pipe character (|), the colon (:) and others.

They - DSV files - are described in this section, Data File Metaformats, under Chapter 5: Textuality, of Eric Raymond (ESR)'s book, The Art of Unix Programming, which is a recommended read for anyone interested in Unix (one of the longest-lived operating systems [1]) and in software design and development.

[1] And, speaking a bit loosely, nowaday Unix is also the most widely used OS in the world, by a fair margin, due to its use (as a variant) in Android and iOS based mobile devices, both of which are Unix-based, not to mention Apple MacOS and Linux computers, which also are. Android devices alone number in the billions.

The program, read_dsv.py, is a command-line utility, written in Python, that allows you to specify the delimiter character in one of two ways:

- with a "-c delim_char" option, in which delim_char is an ASCII character,
- with a "-n delim_code" option, in which delim_code is an ASCII code.

It then reads either the files specified as command-line arguments after the -n or -c option, or if no files are given, it reads its standard input.

Here is the code for read_dsv.py:
from __future__ import print_function

"""
read_dsv.py
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Purpose: Shows how to read DSV data, i.e. 

https://en.wikipedia.org/wiki/Delimiter-separated_values 

from either files or standard input, split the fields of each
line on the delimiter, and process the fields in some way.
The delimiter character is configurable by the user and can
be specified as either a character or its ASCII code.

Reference:
TAOUP (The Art Of Unix Programming): Data File Metaformats:
http://www.catb.org/esr/writings/taoup/html/ch05s02.html
ASCII table: http://www.asciitable.com/
"""

import sys
import string

def err_write(message):
    sys.stderr.write(message)

def error_exit(message):
    err_write(message)
    sys.exit(1)

def usage(argv, verbose=False):
    usage1 = \
        "{}: read and process DSV (Delimiter-Separated-Values) data.\n".format(argv[0])
    usage2 = "Usage: python" + \
        " {} [ -c delim_char | -n delim_code ] [ dsv_file ] ...\n".format(argv[0])
    usage3 = [
        "where one of either the -c or -n option must be given,\n",  
        "delim_char is a single ASCII delimiter character, and\n", 
        "delim_code is a delimiter character's ASCII code.\n", 
        "Text lines will be read from specified DSV file(s) or\n", 
        "from standard input, split on the specified delimiter\n", 
        "specified by either the -c or -n option, processed, and\n", 
        "written to standard output.\n", 
    ]
    err_write(usage1)
    err_write(usage2)
    if verbose:
        for line in usage3:
            err_write(line)

def str_to_int(s):
    try:
        return int(s)
    except ValueError as ve:
        error_exit(repr(ve))

def valid_delimiter(delim_code):
    return not invalid_delimiter(delim_code)

def invalid_delimiter(delim_code):
    # Non-ASCII codes not allowed, i.e. codes outside
    # the range 0 to 255.
    if delim_code < 0 or delim_code > 255:
        return True
    # Also, don't allow some specific ASCII codes;
    # add more, if it turns out they are needed.
    if delim_code in (10, 13):
        return True
    return False

def read_dsv(dsv_fil, delim_char):
    for idx, lin in enumerate(dsv_fil):
        fields = lin.split(delim_char)
        assert len(fields) > 0
        # Knock off the newline at the end of the last field,
        # since it is the line terminator, not part of the field.
        if fields[-1][-1] == '\n':
            fields[-1] = fields[-1][:-1]
        # Treat a blank line as a line with one field,
        # an empty string (that is what split returns).
        print("Line", idx, "fields:")
        for idx2, field in enumerate(fields):
            print(str(idx2) + ":", "|" + field + "|")

def main():
    # Get and check validity of arguments.
    sa = sys.argv
    lsa = len(sa)
    if lsa == 1:
        usage(sa)
        sys.exit(0)
    if lsa == 2:
        # Allow the help option with any letter case.
        if sa[1].lower() in ("-h", "--help"):
            usage(sa, verbose=True)
            sys.exit(0)
        else:
            usage(sa)
            sys.exit(0)

    # If we reach here, lsa is >= 3.
    # Check for valid mandatory options (sic).
    if not sa[1] in ("-c", "-n"):
        usage(sa, verbose=True)
        sys.exit(0)

    # If -c option given ...
    if sa[1] == "-c":
        # If next token is not a single character ...
        if len(sa[2]) != 1:
            error_exit(
            "{}: Error: -c option needs a single character after it.".format(sa[0]))
        if not sa[2] in string.printable:
            error_exit(
            "{}: Error: -c option needs a printable ASCII character after it.".format(\
            sa[0]))
        delim_char = sa[2]
    # else if -n option given ...
    elif sa[1] == "-n":
        delim_code = str_to_int(sa[2])
        if invalid_delimiter(delim_code):
            error_exit(
            "{}: Error: invalid delimiter code {} given for -n option.".format(\
            sa[0], delim_code))
        delim_char = chr(delim_code)
    else:
        # Checking for what should not happen ... a bit of defensive programming here.
        error_exit("{}: Program error: neither -c nor -n option given.".format(sa[0]))

    try:
        # If no filenames given, read sys.stdin ...
        if lsa == 3:
            print("processing sys.stdin")
            dsv_fil = sys.stdin
            read_dsv(dsv_fil, delim_char)
            dsv_fil.close()
        # else (filenames given), read them ...
        else:
            for dsv_filename in sa[3:]:
                print("processing file:", dsv_filename)
                dsv_fil = open(dsv_filename, 'r')
                read_dsv(dsv_fil, delim_char)
                dsv_fil.close()
    except IOError as ioe:
        error_exit("{}: Error: {}".format(sa[0], repr(ioe)))
        
if __name__ == '__main__':
    main()

Here are test runs of the program (both valid and invalid), and the results of each one:

Run it without any arguments. Gives a brief usage message.
$ python read_dsv.py
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...
Run it with a -h option (for help). Gives the verbose usage message.
$ python read_dsv.py -h
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...
where one of either the -c or -n option must be given,
delim_char is a single ASCII delimiter character, and
delim_code is a delimiter character's ASCII code.
Text lines will be read from specified DSV file(s) or
from standard input, split on the specified delimiter
specified by either the -c or -n option, processed, and
written to standard output.
Run it with a -v option (invalid run). Gives the brief usage message.
$ python read_dsv.py -v
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...
Run it with a -c option but no ASCII character argument (invalid run). Gives the brief usage message.
$ python read_dsv.py -c
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...
Run it with a -c option followed by the pipe character (invalid run). The OS (here, Windows) gives an error message because the pipe character cannot be used to end a pipeline.
$ python read_dsv.py -c |
The syntax of the command is incorrect.
Run it with the -c option and the pipe character as the delimiter character, but protected (by double quotes) from interpretation by the OS shell (CMD).
$ python read_dsv.py -c "|" file1.dsv
processing file: file1.dsv
Line 0 fields:
0: |1|
1: |2|
2: |3|
3: |4|
4: |5|
5: |6|
6: |7|
Line 1 fields:
0: |field1|
1: |fld2|
2: | fld3 with spaces around it |
3: |    fld4 with leading spaces|
4: |fld5 with trailing spaces     |
5: |next field is empty|
6: ||
7: |last field|
Line 2 fields:
0: ||
1: |1|
2: |22|
3: |333|
4: |4444|
5: |55555|
6: |666666|
7: |7777777|
8: |88888888|
Line 3 fields:
0: ||
1: ||
2: ||
3: ||
4: ||
5: |                      |
6: ||
7: ||
8: ||
9: ||
10: ||
Run it with the -n option followed by 124, the ASCII code for the pipe character as the delimiter.
$ python read_dsv.py -n 124 file1.dsv
[Gives exact same output as the run above, as it should, because both use the same delimiter and read the same input file.]
Copy file1.dsv to file3.dsv. Change all the pipe characters (delimiters) to colons:
Run it with the -n option followed by 58, the ASCII code for the colon character.
$ python read_dsv.py -n 58 file3.dsv
[Gives exact same output as the run above, as it should, because other than the delimiters (pipe versus colon), the input is the same.]

I added support for the -n option to the program because it makes it more flexible, since you can specify any ASCII character as the delimiter (that makes sense), by giving its ASCII code.

And of course, to find out the values of the ASCII codes for these delimiter characters, I used the char_to_ascii_code.py program from my recent post:

Trapping KeyboardInterrupt and EOFError for program cleanup

You may have noticed that I mentioned delimiter characters and DSV files in that post too. The char_to_ascii_code.py utility shown in that post was created to find the ASCII code for any character (without having to look it up on the web each time).

- Enjoy.

- Vasudev Ram - Online Python training and consulting

- Black Flyday at Flywheel Wordpress Managed Hosting - get 3 months free on the annual plan.

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes



Wednesday, October 26, 2016

Read from CSV with D, write to PDF with Python

By Vasudev Ram


CSV => PDF

Here is another in my series of applications of xtopdf, my PDF creation toolkit for Python (xtopdf source here).

This xtopdf application is actually a pipeline (nothing Unix-specific though, will work on both *nix and Windows) - a D program reading CSV data and sending it to a Python program, which writes the data to PDF.

The D program, read_csv.d, reads CSV data from a .csv file, and writes it to standard output.

The Python program, StdinToPDF.py (which is part of the xtopdf toolkit), reads its standard input (which is redirected by the pipeline to come from the D program's standard output) and writes the data it reads, to PDF.

Here is the D program, read_csv.d:
/**************************************************
File: read_csv.d
Purpose: A program to read CSV data from a file and 
write it to standard output.
Author: Vasudev Ram
Date created: 2016-10-25
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
**************************************************/

import std.algorithm;
import std.array;
import std.csv;
import std.stdio;
import std.file;
import std.typecons;

int main()
{
    try {
        stderr.writeln("Reading CSV data from file.");
        auto file = File("input.csv", "r");
        foreach (record;
            file.byLine.joiner("\n").csvReader!(Tuple!(string, string, int)))
        {
            writefln("%s works as a %s and earns $%d per year",
                     record[0], record[1], record[2]);
        }
    } catch (CSVException csve) {
        stderr.writeln("Caught CSVException: msg = ", csve.msg, 
        " at row, col = ", csve.row, ", ", csve.col);
    } catch (FileException fe) {
        stderr.writeln("Caught FileException: msg = ", fe.msg);
    } catch (Exception e) {
        stderr.writeln("Caught Exception: msg = ", e.msg);
    }
    return 0;
}
The D program is compiled as usual with:
dmd read_csv.d
I ran it first (only the D program) with an invalid CSV file (it has an extra comma at the start on line 3, which invalidates the data by making "Driver" be in the salary column position), and got the expected error message, which includes the row and column number of the place in the CSV file where the program encountered the error - this is useful for fixing the input data:
$ type input.csv
Jack,Carpenter,40000
Tom,Blacksmith,50000
,Jill,Driver,60000
$ read_csv
Reading CSV data from file.
Jack works as a Carpenter and earns $40000 per year
Tom works as a Blacksmith and earns $50000 per year
Caught CSVException: msg = Unexpected 'D' when converting from type string to type int 
at row, col = 3, 3
Then I ran it again, in the regular way, this time with a valid CSV file, and as part of a pipeline, the other pipeline component being StdinToPDF:
$ read_csv | python StdinToPDF.py csv_output.pdf
Reading CSV data from file.
And here is a cropped view of the output as seen in Foxit PDF Reader:


- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

FlyWheel - Managed WordPress Hosting



Monday, November 23, 2015

Convert XLSX to PDF with Python and xtopdf

By Vasudev Ram


XLSX => PDF

This is a simple application of my xtopdf toolkit, showing how to use it to convert XLSX data, i.e. Microsoft Excel data, to PDF (Portable Document Format). It only converts text data, not the formatting, colors, fonts, etc., that may be present in the Excel file.

For the input, I will use this small Excel file, fruits2.xlsx, which I created. A screenshot of it is below (click to enlarge):


Here is the code for XLSXtoPDF.py:
# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()
And here is a screenshot of the PDF output in fruits2.pdf:

There are some points worth mentioning in connection with conversion of data to and from PDF. I will discuss them in a follow-up post.

- Vasudev Ram - Online Python training and programming

Signup to hear about new products and services I create.

Posts about Python  Posts about xtopdf

My ActiveState recipes

Thursday, March 27, 2014

Database to JSON in Python

By Vasudev Ram






I had been doing some work involving JSON recently; while doing that, I got the idea of writing some code to convert database data to JSON. Here's a simple Python program I wrote for that. It can be improved in many ways (*), and there may be many other ways of implementing it, but this program shows the basic approach. The program is simple, but can be useful, since JSON is a useful data interchange format.

See this StackOverflow post for some approaches.

Also, I used an SQLite database in this example, for convenience, since the sqlite3 module comes with the Python standard library, so it's easier for any reader to run this program without having to download and install some other database and its Python driver. But the program can easily be adapted (by someone with basic knowledge of SQL) to other databases that support some form of access via Python. Note: the program makes use of a SQLite-specific feature, so some changes may be required for other databases. For comparison purposes, I print out the data fetched from the database both as a Python object and a JSON string.

(Also see a related post: JSONLint.com, an online JSON validator.)

Here is the program, DBtoJSON.py:
# DBtoJSON.py
# Author: Vasudev Ram - http://www.dancingbison.com
# Copyright 2014 Vasudev Ram
# DBtoJSON.py is a program to DEMOnstrate how to read 
# SQLite database data and convert it to JSON.

import sys
import sqlite3
import json

try:

    conn = sqlite3.connect('example.db')

    # This enables column access by name: row['column_name']
    conn.row_factory = sqlite3.Row

    curs = conn.cursor()

    # Create table.
    curs.execute('''DROP TABLE IF EXISTS stocks''')
    curs.execute('''CREATE TABLE stocks
                 (date text, trans text, symbol text, qty real, price real)''')

    # Insert a few rows of data.
    curs.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.0)")
    curs.execute("INSERT INTO stocks VALUES ('2007-02-06','SELL','ORCL',200,25.1)")
    curs.execute("INSERT INTO stocks VALUES ('2008-03-06','HOLD','IBM',200,45.2)")

    # Commit the inserted rows.
    conn.commit()

    # Now fetch back the inserted data and write it to JSON.
    curs.execute("SELECT * FROM stocks")
    recs = curs.fetchall()

    print "DB data as a list with a dict per DB record:"
    rows = [ dict(rec) for rec in recs ]
    print rows

    print

    print "DB data as a single JSON string:"
    rows_json = json.dumps(rows)
    print rows_json

except Exception, e:
    print "ERROR: Caught exception: " + repr(e)
    raise e
    sys.exit(1)

# EOF
The program is self-contained; you don't even need to set up a database and a table and populate it beforehand; the code does that. You just run:
python DBtoJSON.py
And here is its output:
DB data as a list with a dict per DB record:
[{'date': u'2006-01-05', 'symbol': u'RHAT', 'trans': u'BUY', 'price': 35.0, 'qty
': 100.0}, {'date': u'2007-02-06', 'symbol': u'ORCL', 'trans': u'SELL', 'price':
 25.1, 'qty': 200.0}, {'date': u'2008-03-06', 'symbol': u'IBM', 'trans': u'HOLD'
, 'price': 45.2, 'qty': 200.0}]

DB data as a single JSON string:
[{"date": "2006-01-05", "symbol": "RHAT", "trans": "BUY", "price": 35.0, "qty":
100.0}, {"date": "2007-02-06", "symbol": "ORCL", "trans": "SELL", "price": 25.1,
 "qty": 200.0}, {"date": "2008-03-06", "symbol": "IBM", "trans": "HOLD", "price"
: 45.2, "qty": 200.0}]

(*) And remember, this was a demo :-)
Read other Python posts on my blog.

- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Thursday, August 23, 2012

Fulltext, Python library to convert documents and media to text - for full-text indexing for search


Fulltext is a simple Python library for converting document and media files to text. It's main purpose is for use with full-text indexing systems.

See: https://github.com/btimby/fulltext

and

http://pypi.python.org/pypi/fulltext/0.1-1 (Site giving an error at present)

For example, to easily extract text from a PDF file:

> python
> import fulltext
> fulltext.get('resume.pdf')
'Experience: ...'

Excerpt from the github site for fulltext:

[ Fulltext is a library that makes converting various file formats to plain text simple. Mostly it is a wrapper around shell tools. It will execute the shell program, scrape it's results and then post-process the results to pack as much text into as little space as possible.

Supported formats:
The following formats are supported using the command line apps listed.

application/pdf: pdftotext
application/msword: antiword
application/vnd.openxmlformats-officedocument.wordprocessingml.document:
docx2txt
application/vnd.ms-excel: convertxls2csv
application/rtf: unrtf
application/vnd.oasis.opendocument.text: odt2txt
application/vnd.oasis.opendocument.spreadsheet: odt2txt
application/zip: funzip
application/x-tar, gzip: tar & gunzip
application/x-tar, bzip2: tar & bunzip2
application/rar: unrar
text/html: html2text
text/xml: html2text
image/jpeg: exiftool
video/mpeg: exiftool
audio/mpeg: exiftool
application/octet-stream: strings ]

Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com