Neopythonic: programming languages

Showing posts with label programming languages. Show all posts

Monday, July 25, 2011

Before Python

This morning I had a chat with the students at Google's CAPE program. Since I wrote up what I wanted to say I figured I might as well blog it here. Warning: this is pretty unedited (or else it would never be published :-). I'm posting it in my "personal" blog instead of the "Python history" blog because it mostly touches on my career before Python. Here goes.

Have you ever written a computer program? Using which language?

HTML
Javascript
Java
Python
C++
C
Other - which?

[It turned out the students had used a mixture of Scratch, App Inventor, and Processing. A few students had also used Python or Java.]

Have you ever invented a programming language? :-)

If you have programmed, you know some of the problems with programming languages. Have you ever thought about why programming isn't easier? Would it help if you could just talk to your computer? Have you tried speech recognition software? I have. It doesn't work very well yet. :-)

How do you think programmers will write software 10 years from now? Or 30? 50?

Do you know how programmers worked 30 years ago?

I do.

I was born in Holland in 1956. Things were different.

I didn't know what a computer was until I was 18. However, I tinkered with electronics. I built a digital clock. My dream was to build my own calculator.

Then I went to university in Amsterdam to study mathematics and they had a computer that was free for students to use! (Not unlimited though. We were allowed to use something like one second of CPU time per day. :-)

I had to learn how to use punch cards. There were machines to create them that had a keyboard. The machines were as big as a desk and made a terrible noise when you hit a key: a small hole was punched in the card with a huge force and great precision. If you made a mistake you had to start over.

I didn't get to see the actual computer for several more years. What we had in the basement of the math department was just an end point for a network that ran across the city. There were card readers and line printers and operators who controlled them. But the actual computer was elsewhere.

It was a huge, busy place, where programmers got together and discussed their problems, and I loved to hang out there. In fact, I loved it so much I nearly dropped out of university. But eventually I graduated.

Aside: Punch cards weren't invented for computers; they were invented for sorting census data and the like before WW2. [UPDATE: actually much earlier, though the IBM 80-column format I used did originate in 1928.] There were large mechanical machines for sorting stacks of cards. But punch cards are the reason that some software still limits you (or just defaults) to 80 characters per line.

My first program was a kind of "hello world" program written in Algol-60. That language was only popular in Europe, I believe. After another student gave me a few hints I learned the rest of the language straight from the official definition of the language, the "Revised Report on the Algorithmic Language Algol-60." That was not an easy report to read! The language was a bit cumbersome, but I didn't mind, I learned the basics of programming anyway: variables, expressions, functions, input/output.

Then a professor mentioned that there was a new programming language named Pascal. There was a Pascal compiler on our mainframe so I decided to learn it. I borrowed the book on Pascal from the departmental library (there was only one book, and only one copy, and I couldn't afford my own). After skimming it, I decided that the only thing I really needed were the "railroad diagrams" at the end of the book that summarized the language's syntax. I made photocopies of those and returned the book to the library.

Aside: Pascal really had only one new feature compared to Algol-60, pointers. These baffled me for the longest time. Eventually I learned assembly programming, which explained the memory model of a computer for the first time. I realized that a pointer was just an address. Then I finally understood them.

I guess this is how I got interested in programming languages. I learned the other languages of the day along the way: Fortran, Lisp, Basic, Cobol. With all this knowledge of programming, I managed to get a plum part-time job at the data center maintaining the mainframe's operating system. It was the most coveted job among programmers. It gave me access to unlimited computer time, the fastest terminals (still 80 x 24 though :-), and most important, a stimulating environment where I got to learn from other programmers. I also got access to a Unix system, learned C and shell programming, and at some point we had an Apple II (mostly remembered for hours of playing space invaders). I even got to implement a new (but very crummy) programming language!

All this time, programming was one of the most fun things in my life. I thought of ideas for new programs to write all the time. But interestingly, I wasn't very interested in using computers for practical stuff! Nor even to solve mathematical puzzles (except that I invented a clever way of programming Conway's Game of Life that came from my understanding of using logic gates to build a binary addition circuit).

What I liked most though was write programs to make the life of programmers better. One of my early creations was a text editor that was better than the system's standard text editor (which wasn't very hard :-). I also wrote an archive program that helped conserve disk space; it was so popular and useful that the data center offered it to all its customers. I liked sharing programs, and my own principles for sharing were very similar to what later would become Open Source (except I didn't care about licenses -- still don't :-).

As a term project I wrote a static analyzer for Pascal programs with another student. Looking back I think it was a horrible program, but our professor thought it was brilliant and we both got an A+. That's where I learned about parsers and such, and that you can do more with a parser than write a compiler.

I combined pleasure with a good cause when I helped out a small left-wing political party in Holland automate their membership database. This was until then maintained by hand as a collection of metal plates plates into which letters were stamped using an antiquated machine not unlike a steam hammer :-). In the end the project was not a great success, but my contributions (including an emulation of Unix's venerable "ed" editor program written in Cobol) piqued the attention of another volunteer, whose day job was as computer science researcher at the Mathematical Center. (Now CWI.)

This was Lambert Meertens. It so happened that he was designing his own programming language, named B (later ABC), and when I graduated he offered me a job on his team of programmers who were implementing an interpreter for the language (what we would now call a virtual machine).

The rest I have written up earlier in my Python history blog.

Friday, June 3, 2011

The depth and breadth of Python

As of late I'm noticing a trend: I'm spending more time having in-person in-depth conversations, and less time coding. While I regret the latter, I really enjoy the former. Certainly more than weekly meetings, code reviews, or bikeshedding email threads. (I'm not all that excited about blogging either, as you may have guessed; but some things just don't fit in 140 characters.)

Two conversations with visitors I particularly enjoyed this week were both with very happy Python users, and yet they couldn't be more different. This to me is a confirmation of Python's enduring depth and breadth: it is as far away of a one-trick language as you can imagine.

My first visitor was Annie Liu, a professor of computer science (with a tendency to theory :-) at Stony Brook University in New York State. During an animated conversation that lasted nearly three hours (and still she had more to say :-) she explained to me the gist of her research, which appears to be writing small Python programs that implement fundamental algorithms using set comprehensions, and then optimizing the heck out of it using an automated approach she summarized as the three I's: Iterate, incrementalize, and implement. While her academic colleagues laugh at her for her choice of such a non-theoretical language like Python, her students love it, and she seems to be having the last laugh, obtaining publication-worthy results that don't require advanced LaTeX skills, nor writing in a dead language like SETL (of which she is also a great fan, and which, via ABC, had some influence on Python -- see also below).

Annie told me an amusing anecdote about an inscrutable security standard produced by NiST a decade ago, with a fifty-page specification written in Z. She took a 12-page portion of it and translated it into a 120-line Python program, which was much more readable than the original, and in the process she uncovered some bugs in the spec!

Another anecdote she recounted had reached me before, but somehow I had forgotten about it until she reminded me. It concerns the origins of Python's use of indentation. The anecdote takes place long before Python was created. At an IFIP working group meeting in a hotel, one night the delegates could not agree about the best delimiters to use for code blocks. On the table were the venerable BEGIN ... END, the newcomers { ... }, and some oddities like IF ... FI and indentation. In desperation someone said the decision had to be made by a non-programmer. The only person available was apparently Robert Dewar's wife, who in those days traveled with her husband to these events. Despite the late hour, she was called down from her hotel room and asked for her independent judgement. Immediately she decided that structuring by pure indentation was the winner. Now, I've probably got all the details wrong here, but apparently Lambert Meertens was present, who went on to design Python's predecessor, ABC, though at the time he called it B (the italics meant that B was not the name of the language, but the name of the variable containing the name of the language). I checked my personal archives, and the first time I heard this was from Prof. Paul Hilfinger at Berkeley, who recounted a similar story. In his version, it was just Lambert Meertens and Robert Dewar, and Robert Dewar's wife chose indentation because she wanted to go to bed. Either way it is a charming and powerful story. (UPDATE: indeed the real story was quite different.)

Of course Annie had some requests as well. I'll probably go over these in more detail on python-ideas, but here's a quick rundown (of what I could remember):

Quantifiers. She is really longing for the "SOME x IN xs HAS pred" notation from ABC (and its sibling "EACH x IN xs HAS pred"), which superficially resemble Python's any() and all() functions, but have the added semantics of making x available in the scope executed when the test succeeds (or fails, in the case of EACH -- then x represents a counterexample).
Type declarations. (Though I think she would be happy with Python 3 function annotations, possibly augmented with the attribute declarations seen in e.g. Django and App Engine's model classes.)
Pattern matching, a la Erlang. I have been eying these myself from time to time; it is hard to find a syntax that really shines, but it seems to be a useful feature.
Something she calls labels or yield points. It seems somewhat similar to yield statements in generators, but not quite.
She has only recently begun to look at distributed algorithms (she had some Leslie Lamport anecdotes as well) and might prefer sets to be immutable after all. Though that isn't so clear; her work so far has actually benefited from mutating sets to maintain some algorithmic invariant. (The "incrementalize" of the three I's actually refers to a form of "differentiation" of expressions that produce a new set for each input.)

The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox. Dropbox currently has 25 million users, stores petabytes of data on Amazon S3, is profitable, and is not for sale. Drew is an easygoing MIT graduate who is equally comfortable discussing custom memory allocators, the world of venture capitalism, and how to keep engineers happy; he likes hard problems and winning.

Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java.

Next Monday I'm having lunch with another high-tech enterpreneur, a Y-combinator start-up founder using (and contributing to) App Engine. Maybe I should just cancel all weekly meetings and sign off from all mailing lists and focus on two things: meeting Python users and coding. That's the life!

Monday, April 27, 2009

Final Words on Tail Calls

A lot of people remarked that in my post on Tail Recursion Elimination I confused tail self-recursion with other tail calls, which proper Tail Call Optimization (TCO) also eliminates. I now feel more educated: tail calls are not just about loops. I started my blog post when someone pointed out several recent posts by Pythonistas playing around with implementing tail self-recursion through decorators or bytecode hacks. In the eyes of the TCO proponents those were all amateurs, and perhaps that's so.

The one issue on which TCO advocates seem to agree with me is that TCO is a feature, not an optimization. (Even though in some compiled languages it really is provided by a compiler optimization.) We can argue over whether it is a desirable feature. Personally, I think it is a fine feature for some languages, but I don't think it fits Python: The elimination of stack traces for some calls but not others would certainly confuse many users, who have not been raised with tail call religion but might have learned about call semantics by tracing through a few calls in a debugger.

The main issue here is that I expect that in many cases tail calls are not of a recursive nature (neither direct nor indirect), so the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder. For example, if your have a function ending in something like this:

if x > y:
 return some_call(z)
else:
 return 42

and you end up in the debugger inside some_call() whereas you expected to have taken the other branch, with TCO as a feature your debugger can't tell you the value of x and y, because the stack frame has been eliminated.

(I'm sure at this point someone will bring up that the debugger should be smarter. Sure. I'm expecting your patch for CPython any minute now.)

The most interesting use case brought up for TCO is the implementation of algorithms involving state machines. The proponents of TCO claim that the only alternative to TCO is a loop with lots of state, which they consider ugly. Now, apart from the observation that since TCO essentially is a GOTO, you write spaghetti code using TCO just as easily, Ian Bicking gave a solution that is as simple as it is elegant. (I saw it in a comment to someone's blog that I can't find back right now; I'll add a link if someone adds it in a comment here.) Instead of this tail call:

return foo(args)

you write this:

return foo, (args,)

which doesn't call foo() but just returns it and an argument tuple, and embed everything in a "driver" loop like this:

func, args = ...initial func/args pair...
while True:
  func, args = func(*args)

If you need an exit condition you can use an exception, or you could invent some other protocol to signal the end of the loop (like returning None).

And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as "the Basic of the future". Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages -- as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say "tail call optimization" or "proper tail recursion." :-)

Friday, November 21, 2008

Scala?

I received a free copy of the Scala eBook by Martin Odersky, Lex Spoon, and Bill Venners. I tried to read it on my CalTrain commute, but ended up only skimming -- at 700+ pages it's a hefty tome. I'm also not in the primary target audience, as its focus seems to bounce between trying to entice Java programmers to try a better language, and an introduction to programming that happens to use Scala. I ended up going to the Scala website where I downloaded various other documents, such as a tutorial, a more technical overview paper, and the language reference.

Sad to say, I'm quite disappointed (in the language -- can't really say much about the quality of the eBook except that it's as big as the language). One of the first times I heard Scala mentioned it was a talk by Steve Yegge at Google I/O last May. Steve characterized Scala as "static typic's revenge" (I'm paraphrasing -- too lazy to watch the video to find that section) and went on to make fun of the incredible complexity of Scala's type system, which contains such esoteric concepts as type erasures, variance annotations, existential types, polymorphic method types, and many more (just scan the ToC of the language reference).

I have to agree with Steve -- if this is what it takes to have compile-time type-safety in a language, I'll take dynamic typing any day. There's got to be a better way -- perhaps Haskell? Haskell is a pure functional language with a fast implementation that seems to have solved the I/O problem of functional languages well -- while its Monads are truly deep and take real work to create, their use seems quite straightforeward.

Perhaps my biggest disappointment in Scala is that they have so many rules to make it possible to write code that looks straightforward, while being anything but -- these rules all seem to have innumerable exceptions, making it hard to know when writing simple code will work and when not. For example, the Scala book explains that you can use multiple 'if' qualifiers in its for-statement, but points out that you have to insert a semicolon in a position where you'd expect it to be inferred by the parser. To make matters even more mysterious, if you were to use curly braces instead of parentheses, you can omit the semicolon. All this is presented without any explanation. I found the explanation by combining two seemingly unrelated rules in the reference manual: first, the for-statement (technically called for-comprehension) allows both braces or parentheses with exactly the same semantics, but the rules for inferring semicolons are different inside braces than they are inside parentheses. Obvious, eh! :-)