jugad2 - Vasudev Ram on software innovation: pipelines

Showing posts with label pipelines. Show all posts

Wednesday, September 14, 2016

Func-y D + Python pipeline to generate PDF

Hi, readers,

Here is a pipeline that generates some output from a D (language) program and passes it to a Python program that converts that output to PDF. The D program makes use of a bit of (simple) template programming / generics and a bit of functional programming (using the std.functional module from Phobos, D's standard library).

D => Python

I'm showing this as yet another example of the uses of xtopdf, my Python toolkit for PDF creation, as well as for the D part, which is fun (pun intended :), and because those D features are powerful.

The D program is derived, with some modifications, from a program in this post by Gary Willoughby,

More hidden treasure in the D standard library .

That program demonstrates, among other things, the pipe feature of D from the std.functional module.

First, the D program, student_grades.d:

/*
student_grades.d
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Adapts code from:
http://nomad.so/2015/08/more-hidden-treasure-in-the-d-standard-library/
*/

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;
import std.functional;

// Set up the functional pipeline by composing some functions.
alias sumString = pipe!(split, map!(to!(int)), sum);

void main(string[] args)
{
    // Data to be transformed:
    // Each string has the student name followed by 
    // their grade in 5 subjects.
    auto student_grades_list = 
    [
        "A 1 2 3 4 5",
        "B 2 3 4 5 6",
        "C 3 4 5 6 7",
        "D 4 5 6 7 8",
        "E 5 6 7 8 9",
    ];

    // Transform the data for each student.
    foreach(student_grades; student_grades_list) {
        auto student = student_grades[0]; // name
        auto total = sumString(student_grades[2..$]); // grade total
        writeln("Grade total for student ", student, ": ", total);
    }
}

The initial data (which the D program transforms) is hard-coded, but that can easily be changed to read it from a text file, for instance. The program uses pipe() to compose [1] the functions split, map (to int), and sum (some of which are generic / template functions).

So, when it is run, each string (student_grades) of the input array student_grades_list is split (into an array of smaller strings, by space as the delimiter); then each string in the array (except for the first, which is the name), is mapped (converted) to integer; then all the integers are summed up to get the student's grade total; finally these names and totals are written to standard output. That becomes the input to the next stage of the pipeline, the Python program, which does the conversion to PDF.

Build the D program with:

dmd -o- student_grades.d

which gives us the executable, student_grades(.exe).

The Python part of the pipeline is StdinToPDF.py, one of the apps in the xtopdf toolkit, which is designed to be used in pipelines such as the above - basically, any Unix, Windows or Mac OS X pipeline, that generates text as its final output, can be further terminated by StdinToPDF, resulting in conversion of that text to PDF. It is a small Python app written using the core class of the xtopdf library, PDFWriter. Here is the original post about StdinToPDF:

[xtopdf] PDFWriter can create PDF from standard input

Here is the pipeline:

$ student_grades | python StdinToPDF.py sg.pdf

Below is a cropped screenshot of the generated output, sg.pdf, as seen in Foxit PDF Reader.

[1] Speaking of functional composition, you may like to check out this earlier post by me:

fmap(), "inverse" of Python map() function

It's about a different way of composing functions (for certain cases). It also has an interesting comment exchange between me and a reader, who showed both how to do it in Scala, and another way to do it in Python. Also, there is another way to compose functions in D, using a function called compose; it works like pipe, but the composed functions are written in the reverse order. It's in the same D module as pipe, std.functional.

Finally (before I pipe down - for today :), check out this HN comment by me (on another topic), in which I mention and link to multiple other ways of doing pipe-like stuff in Python:

Comment on Streem – a new programming language from Matz

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

Share |

Friday, October 24, 2014

Print selected text pages to PDF with Python, selpg and xtopdf on Linux

By Vasudev Ram

In a recent blog post, titled My IBM developerWorks article, I talked about a tutorial that I had written for IBM developerWorks a while ago. The tutorial showed some of the recommended techniques and practices to follow when writing a Linux command-line utility that is intended for production use, and how to write it in such a way that it can easily cooperate with existing UNIX command-line tools, when used in a UNIX command pipeline.

This ability of properly written command-line tools to cooperate with each other when used in a pipeline, is, as I said in that IBM article, one of the keys to the power of Linux (and UNIX) as a development environment. (See the classic book The UNIX Programming Environment, for much more on this topic.)

The utility I wrote and discussed (in that IBM article), called selpg (for SELect PaGes), allows the user to select a specified range of pages from a text file. At the end of the aforementioned blog post, I had said that I would show some practical uses of the selpg utility later. I describe one such use case below, involving a combination of selpg and my xtopdf toolkit), which is a Python library for PDF creation.

(The xtopdf toolkit contains a PDF creation library, and also includes some sample applications that show how to use the library to create PDF output in various ways, and from various input sources, which is why I tend to call xtopdf a toolkit instead of just a library.

I had written one such application of xtopdf a while ago, called StdinToPDF(.py) (for standard input to PDF). I blogged about it at the time, here:

[xtopdf] PDFWriter can create PDF from standard input. (PDFWriter is a module of xtopdf, which provides the core PDF creation functionality.)

The selpg utility can be used with StdinToPDF, in a pipeline, to select a range of pages (by starting and ending page numbers) from a (possibly large) text file, and write only those selected pages to a PDF file. Here is an example of how to do that:

First, build the selpg utility from source, for your Linux OS. selpg is only meant to work on Linux, since it uses some Linux C standard library functions, such as from stdio.h, and popen(); but you can try to run it on Windows (at your own risk), since Windows does have (had?) a POSIX subsystem, from Windows NT onward. I have used it in the past. (Update: I checked - according to this section of the Wikipedia article about POSIX, Windows may have had POSIX support only from Windows NT up to Windows 2000.) Anyway, to build selpg on Linux, follow the steps below (the $ sign is the shell prompt and not to be typed):

1. Download the source code from the sources section of the selpg project repository on Bitbucket.

Download all of these files: makefile, mk, selpg.c and showsyserr.c .

2. Make the (shell script) file mk executable, with the command:

$ chmod u+x mk

3. Then run the file mk, with:

$ ./mk

That will run the makefile that builds the selpg executable using the C compiler on your Linux box. The C compiler (invoked as cc or gcc) is installed on most mainstream Linux distributions. If it is not, you will need to install it from the repository for your Linux distribution. Sometimes only a minimal version of a C compiler is installed, which is only enough to (re)compile the kernel after making kernel parameter changes, such as for performance tuning. Consult your local Linux expert for help if such is the case.

3. Now make the file selpg executable, with the command:

$ chmod u+x selpg

4. (Optional) You can check the usage of selpg by reading the IBM tutorial article and/or running selpg without any command-line arguments:

$ ./selpg

which will show a usage message.

6. (Optional) You can run selpg a few times with some text file(s) as input, and different values for the -s and -e command-line options, to get a feel for how it works.

Now download xtopdf (which includes StdinToPDF) from here:

xtopdf on Bitbucket.

To install it, follow the steps given in this post:

Guide to installing and using xtopdf, including creating simple PDF e-books

That post was written a while ago, when xtopdf was hosted on SourceForge. So you need to make one change to the instructions given in that guide: instead of downloading xtopdf from SourceForge, as stated in Step 5 of the guide, get it from the xtopdf Bitbucket link I gave above.

(To make xtopdf work, you also have to install ReportLab, which xtopdf depends uses internally; the steps for that are given in my xtopdf installation guide linked above, or you can also look at the instructions in the ReportLab distribution. It is easy, just a couple of steps - download, unzip, configure a setting or two.)

Once you have both selpg and xtopdf installed, you can use selpg and StdinToPDF together. Here is an example run, to select only pages 2 through 4 from an input text file:

I wrote a simple Python program, gen_selpg_test_file,py, to create a text file that can be used to test the selpg and StdinToPDf programs together.

Here is an excerpt of the core logic of gen_selpg_test_file.py, omitting argument and error handling for brevity (I have those in the actual code):

# Generate the test file with the given filename and number of lines of text.
    try:
        out_fil = open(out_filename, "w")
    except IOError as ioe:
        sys.stderr.write("Error: Could not open output file {}.\n".format(out_filename))
        sys.exit(1)
    for line_num in range(1, num_lines + 1):
        line = "Line #" + str(line_num).zfill(10) + "\n"
        out_fil.write(line)
    out_fil.close()

I ran it like this:

$ python gen_selpg_test_file.py selpg_test_file_1000.txt 1000

to generate a text file with 1000 lines, in the file selpg_test_file_1000.txt .

Then I could run the pipeline using selpg and StdinToPDF, as described above:

$ ./selpg -s2 -e4 selpg_test_file_1000.txt | python StdinToPDF.py p2-p4.pdf

This command extracts only the specifed pages (2 to 4) from the input file, and pipes them to StdinToPDF, which converts those pages only, to PDF, in the filename specified at the end of the command.

After doing the above, you can open the file p2_p4.pdf in your favorite PDF reader (Evince is one PDF reader for Linux), to confirm that it contains all (and only) the lines from page 2 to 4 of the input file selpg_test_file_1000.txt (considering 72 lines per page, which is the default that selpg uses).

Read the IBM article to see how that default can be changed - to either another number of lines per page, e.g. 66 or 80 or whatever, or to specify form feeds (ASCII code 12) as the page delimiter. Form feeds are often used as a page delimiter in text file reports generated by programs, when the reports are destined for a printer, since the form feed character causes the printer to advance the print head to the top of the next page/form (that's how the character got its name).

Though this post seemed long, note that a lot it was either background information or instructions on how to build selpg and install xtopdf. Those are both one time jobs. Once those are done, you can select the needed pages from any text file and print them to PDF with a single command-line, as shown in the last command above.

This is useful when you printed the entire file earlier, and some pages didn't print properly because the printer jammed. Just use selpg with xtopdf to print only the needed pages again.

The image above is from the Wikipedia article on Printing, and titled:

Jikji, "Selected Teachings of Buddhist Sages and Son Masters" from Korea, the earliest known book printed with movable metal type, 1377. Bibliothèque Nationale de France, Paris

- Enjoy.

- Vasudev Ram - Dancing Bison Enterprises

Click here to get email about new products from Vasudev Ram.

Contact Page

Share |

Wednesday, May 29, 2013

Unix-like pipes in Go, from labix.org

pipe - Unix-like pipelines for Go

Looks interesting.

This pipe Go package, by Gustavo Niemeyer of labix.org, allows you to programmatically set up and run a pipeline of processes in Go, and to get the pipe's standard output and standard error, either combined, or separately.

The page has some examples of creating pipelines using the package.

The approach he uses is more like the original Unix pipes concept:

http://en.wikipedia.org/wiki/Pipeline_(Unix)

in that it creates pipelines out of multiple processes, whereas the approach I took with my Python pipe_controller module was different: instead of multiple processes, I used multiple Python functions in a single process:

http://www.google.com/?q=jugad2+pipe_controller

(See the first few hits from the above search for pipe_controller.)

I recently came across some interesting work by Gustavo and others, involving porting Canonical's Juju devops tool from Python to Go.

Relevant links for the Juju port:

https://plus.google.com/app/basic/stream/z123eroqbu20sh5hi04cebzbpyzxznr4ru00k

https://groups.google.com/forum/m/#!topic/golang-nuts/jLnMsUbYwrQ

http://m.youtube.com/watch?v=kKQLhGZVN4A&desktop_uri=%2Fwatch%3Fv%3DkKQLhGZVN4A

https://lists.ubuntu.com/archives/juju-dev/2012-November/000281.html

Interestingly, IIRC, based on some stories that I read a while ago, the word juju is related to magic in some West African language ...

http://en.m.wikipedia.org/wiki/Juju

And that may be why Juju scripts are called charms:

https://juju.ubuntu.com/

P.S. I wonder what Devops_Borat would have to say about Juju (and about Python and Go too :)

https://mobile.twitter.com/devops_borat

- Vasudev Ram
dancingbison.com

jugad2 - Vasudev Ram on software innovation

Pages