Monday, June 10, 2013
Brian Kernighan uses Python
By Vasudev Ram
This is from the site usesthis.com, a.k.a The Setup :-), which has many interviews of well-known computer (and maybe other) people, about their hardware and software setups (and nowadays device setups too, of course). I've read quite a few of those interviews in the past, and they can be interesting.
Brian Kernighan uses Python. Cool ... [1] [2]
Brian Kernighan page (at Bell Labs)
Brian Kernighan page (at Princeton University)
[1] For those who don't know of Kernighan, he was, at Bell Labs, one of the top contributors to Unix and C (from early on, and for a long time), and is the co-author of the classic computer books "The C Programming Language" (with Dennis Ritchie), also abbreviated as K&R, and "The Unix Programming Environment" (with Rob Pike), abbreviated as K&P. Rob Pike has been working on the Go language for the last few years.
[2] I saw the usesthis.com article about Kernighan via this Reddit thread, which also has some interesting comments.
And I almost forgot to mention, Kernighan also co-invented Awk :-)
I was fortunate to get to read those two books (K&R and K&P) early on in my career. In fact, I should probably say "those four books" or "those six books" :-) because I read them at least twice or thrice. They had a big influence on me, and benefited me a lot in my work over the years, possibly more than any other computer books I've read, before or since then.
- Vasudev Ram - Dancing Bison Enterprises
Sunday, February 17, 2013
PAWK, a Python tool like AWK
http://pypi.python.org/pypi/pawk/0.3
pawk gives you some of the features of the AWK programming language (which is a powerful Unix tool), but in Python.
The pawk link above has some examples of awk commands and the equivalent pawk ones.
In some cases, the pawk commands are shorter - except for the p in pawk; heh, reminds me of the anecdote about the Unix creat system call :-)
Tuesday, October 2, 2012
Brian Kernighan's home page w/ his articles, books and source code
Brian Kernighan is still one of my favorite computer authors, even after many years.
He is of course very well-known to a lot of software people, but I'm writing this post because there will still be people (particularly people new or recent to software) who don't know about him and his work, and also because of the useful links to his books, below (some of them with downloadable source code).
He is the co-author of the classic books "The C Programming Language" by Kernighan and Ritchie (well-known as just "K&R") and "The UNIX Programming Environment" by Kernighan and Pike. I read both of those books years ago near the beginning of my programming career, and even today I find there are very few books of that caliber - concise yet dense with information, clear writing, etc.
He was one of the early people to work on UNIX and C at Bell Labs, and had made a lot of contributions to many areas.
The AWK programming language is named partly after him - the K is for Kernighan (*).
Brian Kernighan's home page at Princeton University. He is currently a professor there after working at Bell Labs for many years.
Wikipedia page on Brian Kernighan.
Brian Kernighan page at Bell Labs.
(*) And D is for Digital :), the name of a new introductory book on computers (hardware, software and communications) by Kernighan. I just saw it. It looks like a good present for the non-technical person(s) in your life.
There are some interesting articles at Kernighan's pages at Princeton University and Bell Labs (two of the links above).
Also, the pages about his books have links to downloadable source code for some of the programs in those books. See the links below:
The Practice of Programming.
The C Programming Language.
The Unix Programming Environment.
The AWK Programming Language.
Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com
Sunday, July 15, 2012
sed and awk one-liners - two good pages
I saw these sed and awk one-liner pages via this post on Rajiv Eranki's blog (he works at Dropbox):
Scaling lessons learned at Dropbox, part 1
(The post is about scaling at Dropbox and is interesting in itself.)
The sed and awk one-liner pages:
sed one-liners page
awk one-liners page
For a one-liner that happens to use both sed and awk, check this older post of mine:
UNIX one-liner to kill a hanging Firefox process
The comments on it relating to UNIX processes may be of interest (orphan processes, etc.)
- Vasudev Ram - Dancing Bison Enterprises
The Bentley-Knuth problem and solutions
I recently saw this post about an interesting programming problem on the Web (apparently initially posed by Jon Bentley to Donald Knuth.
For lack of a better term (and also because the name is somewhat memorable), I'm calling it the Bentley-Knuth problem: More shell, less egg
The problem description, from the above post:
[
The program Bentley asked Knuth to write is one that’s become familiar to people who use languages with serious text-handling capabilities: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
]
The post is interesting in itself - read it. For fun, I decided to write solutions to the problem in Python and also in UNIX shell.
My initial Python solution is below. The code is not very Pythonic / refactored / tested, but it works, and does have some minimal error checking. See this Python sorting HOWTO page for some ways it could be improved. UNIX shell solution coming in a while.
UPDATE: Unix shell solution added below the Python one.
Note: I should mention that neither my Python nor UNIX shell solution works exactly the same as the McIlroy shell solution, since that one converts upper case letters to lower case, and also, uses a strict "English dictionary"-style definition of a "word", i.e. only alphabetic characters, whereas my two solutions use the definition of a word as "a sequence of non-blank characters", as is more commonly used in parsing computer programs. But I could add both of the tr invocations to the front of my shell pipeline and get the same result as McIlroy.
# bentley_knuth.py
# Author: Vasudev Ram - http://www.dancingbison.com
# Version: 0.1
# The problem this program tries to solve is from the page:
# http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/
# Description: The program Bentley asked Knuth to write:
# Read a file of text, determine the n most frequently
# used words, and print out a sorted list of those words
# along with their frequencies.
import sys
import os
import string
sys_argv = sys.argv
def usage():
sys.stderr.write("Usage: %s n file\n" % sys_argv[0])
sys.stderr.write("where n is the number of most frequently\n")
sys.stderr.write("used words you want to find, and \n")
sys.stderr.write("file is the name of the file in which to look.\n")
if len(sys_argv) < 3:
usage()
sys.exit(1)
try:
n = int(sys_argv[1])
except ValueError:
sys.stderr.write("%s: Error: %s is not a decimal numeric value" % (sys_argv[0],
sys_argv[1]))
sys.exit(1)
print "n =", n
if n < 1:
sys.stderr.write("%s: Error: %s is not a positive value" %
(sys_argv[0], sys_argv[1]))
in_filename = sys.argv[2]
print "%s: Finding %d most frequent words in file %s" % \
(sys_argv[0], n, in_filename)
try:
fil_in = open(in_filename)
except IOError:
sys.stderr.write("%s: ERROR: Could not open in_filename %s\n" % \
(sys_argv[0], in_filename))
sys.exit(1)
word_freq_dict = {}
for lin in fil_in:
words_in_line = lin.split()
for word in words_in_line:
if word_freq_dict.has_key(word):
word_freq_dict[word] += 1
else:
word_freq_dict[word] = 1
word_freq_list = []
for item in word_freq_dict.items():
word_freq_list.append(item)
wfl = sorted(word_freq_list,
key=lambda word_freq_list: word_freq_list[1], reverse=True)
#wfl.reverse()
print "The %d most frequent words sorted by decreasing frequency:" % n
len_wfl = len(wfl)
if n > len_wfl:
print "n = %d, file has only %d unique words," % (n, len_wfl)
print "so printing %d words" % len_wfl
print "Word: Frequency"
m = min(n, len_wfl)
for i in range(m):
print wfl[i][0], ": ", wfl[i][1]
fil_in.close()
And here is my initial solution in UNIX shell:
# bentley_knuth.sh
# Usage:
# ./bentley_knuth.sh n file
# where "n" is the number of most frequent words
# you want to find in "file".
awk '
{
for (i = 1; i <= NF; i++)
word_freq[$i]++
}
END {
for (i in word_freq)
print i, word_freq[i]
}
' < $2 | sort -nr +1 | sed $1q
- Vasudev Ram - Dancing Bison Enterprises
Wednesday, September 7, 2011
Some ways of doing UNIX-style pipes in Python
For a project I'm working on, I was recently thinking about how to implement something similar to UNIX-style pipes in Python; not necessarily exactly the same, but conceptually similar.
I had deliberately decided *not* to search for this on the Net, so that I could first think about it myself, and figure something out.
But coincidentally today, while browsing the Usenet group comp.lang,python, I came across this post mentioning issues with doing one-liners in Python.
One of the answers given was to check out PyP, a tool for Python that lets you do pipes (in a sense) and data munging like the powerful UNIX tools sed and awk. PyP stands for "Python Power at the Prompt, meaning the UNIX shell prompt, of course. It has an interesting and unusual approach. It is open source, hosted on Google Code, and was apparently initially created by a division of Sony Pictures called ImageWorks, "to facilitate the construction of complex image manipulation unix commands during visual effects work on Alice in Wonderland, Green Lantern, and the upcoming The Amazing Spiderman". Good performance was mentioned as one of it's plus points, apart from the pipe facility itself.
So I checked PyP out a bit and it seems like a nice tool. It has a fairly intuitive syntax for at least basic operations, and is also extensible in at least couple or so ways for more advanced users
I also did a Google query or two with appropriate keywords to find other such tools. Here are some of them, including PyP again:
Pipe module for Python by Julien Palard:
http://dev-tricks.net/pipe-infix-syntax-for-python
Piping support in the standard Python library:
Will update this post later after checking these tools out some more.
Posted via email
- Vasudev Ram @ Dancing Bison
