What is a Functional Language?

March 16, 2011

I mentioned in a previous post that I am developing a new course on functional programming for first-year students.  The immediate question, which none of you asked, is “what do you mean by functional programming?”.  Good question!  Now, if only I had a crisp answer, I’d be in good shape.

Socially speaking, the first problem I run into is a fundamental misunderstanding about the term “functional programming”.  Most people think that it stands for an ideological position that is defined in opposition to what everyone else thinks and does.  So in the mind of a typical colleague, when I speak of functional programming, she hears “deviant religious doctrine” and “knows” instinctively to disregard it.  Yet what I mean by functional programming subsumes what everyone else is doing as a narrow special case.  For as readers of this blog surely know, there is no better imperative language than a functional language!  In fact I often say that Haskell is the world’s best imperative programming language; I’m only half (if that much) joking.

So if functional programming subsumes imperative programming, then what the hell am I talking about when I advocate for functional programming at the introductory level?  And why, then, do we also have an imperative programming course?  Very good questions.  I’m not sure I have a definitive answer, but this being a blog, I can air my opinions.

Were it up to me, we would have one introductory course, probably two semesters long, on programming, which would, of course, emphasize functional as well as imperative techniques, all in one language.  Personally, I think this would be the right way to go, and would prepare the students for all future courses in our curriculum, and for work out there in the “real world.”  Not everyone believes me (and sometimes I am not sure I believe myself), so we must compromise.  While it surely sounds all nice and ecumenical to teach both imperative and functional programming, the reality is that we’re teaching old and new programming methods.  It’s not so much the imperative part that counts, but their obsolescence.  It is deemed important (and, to be honest, I can see the merit in the argument) that students learn the “old ways” because that’s how most of the world works.  So we have to use a language that is ill-defined (admits programs that cannot be given any meaning in terms of the language itself), that forces sequential, rather than parallel thinking, that demands manual allocation of memory, and that introduces all sorts of complications, such as null pointers, that need not exist at all.  And then we have to teach about things called “tools” that help manage the mess we’ve created—tools that do things like recover from Boolean blindness or check for null pointer errors.  To be sure, these tools are totally amazing, but then so are hot metal typesetting machines and steam locomotives.

So what, then, is a functional language?  I can do no better than list some criteria that are important for what I do; the languages that satisfy these criteria are commonly called functional languages, so that’s what I’ll call them too.

  1. It should have a well-defined semantics expressed in terms of the programs you write, not in terms of how it is “compiled” onto some hypothetical hardware and software platform.  I want to be able to reason about the code I’m writing, not about the code the compiler writes.  And I want to define the meaning of my program in terms of the constructs I’m using, not in terms of some supposed underlying mechanism that implements it.  This is the essence of abstraction.
  2. It should support both computation by evaluation and computation by execution.  Evaluation is a smooth generalization of high school- and university-level mathematics, and is defined in terms of standard mathematical concepts such as tuples, functions, numbers, and so forth.  Variables are given meaning by substitution, and evaluation consists of simplifying expressions.  Execution is the action of a program on another agent or data structure conceived of as existing independently of the program itself.  The data lives “over there” and the program “over here” diddles that data.  Assignables (my word for what in imperative languges are wrongly called variables) are given meaning by get and put operations (fetch and store), not by substitution.  Execution consists of performing these operations.
  3. It should be natural to consider both persistent and ephemeral data structures.  This is essentially a restatement of (2).  Persistent data structures are values that can be manipulated like any other value, regardless of their apparent “size” or “complexity”.  We can as easily pass functions as arguments and return them as results as we can do the same with integers.  We can think of trees or streams as values in the same manner.  Ephemeral data structures are what you learn about in textbooks.  They are external to the program, and manipulated destructively by execution.  These are useful, but not nearly as central as the current texts would have you believe (for lack of consideration of any other kind).
  4. It should have a natural parallel cost model based on the  dependencies among computations so that one may talk about parallel algorithms and their complexity as naturally as one currently discusses sequential algorithms.  Imperative-only programming languages fail this criterion miserably, because there are too many implicit dependencies among the components of a computation, with the result that sequentiality is forced on you by the artificial constraints of imperative thinking.
  5. It should have a rich type structure that permits introduction of new abstract types and which supports modular program development.  By giving short shrift to expressions and evaluation, imperative language have an impoverished notion of type—and not support for modularity.  Worse, object-oriented programming, a species of imperative programming, is fundamentally antimodular because of the absurd emphasis on inheritance and the reliance on class-based organizations in which functionality metastasizes throughout a program.

What languages satisfy these criteria best?  Principally ML (both Standard ML and Objective Caml, the “objective” part notwithstanding), and Haskell. (Unfortunately, Haskell loses on the cost model, about which more in another post.)

This is why we teach ML.


Boolean Blindness

March 15, 2011

I hate Booleans!  But isn’t everything “just bits”?  Aren’t computers built from gates?  And aren’t gates just the logical connectives?  How can a Computer Scientist possibly hate Booleans?

Fritz Henglein recently pointed out to me that the world of theoretical Computer Science is divided into two camps, which I’ll call the logicians (who are concerned with languages and semantics) and the combinatorialists (who are concerned with algorithms and complexity).  The logicians love to prove that programs work properly, the combinatorialists love to count how many steps a program takes to run.  Yet, curiously, the logicians hate the Booleans, and love abstractions such as trees or streams, whereas the combinatorialists love bits, and hate abstraction!  (Or so it often seems; allow me my rhetorical devices.)

My point is this.  Booleans are just one, singularly (or, perhaps, binarily) boring, type of data.  A Boolean, b, is either true, or false; that’s it.  There is no information carried by a Boolean beyond its value, and that’s the rub.  As Conor McBride puts it, to make use of a Boolean you have to know its provenance so that you can know what it means.

Booleans are almost invariably (in the CS literature and textbooks) confused with propositions, which express an assertion, or make a claim.  For a proposition, p, to be true means that it has a proof; there is a communicable, rational argument for why p is the case.  For a proposition, p, to be false means that it has a refutation; there is a counterexample that shows why p is not the case.

The language here is delicate.  The judgement that a proposition, p, is true is not the same as saying that p is equal to true, nor is the judgement that p is false the same as saying that p is equal to false!  In fact, it makes no sense to even ask the question, is p equal to true, for the former is a proposition (assertion), whereas the latter is a Boolean (data value); they are of different type.

In classical mathematics, where computability is not a concern, it is possible to conflate the notions of Booleans and propositions, so that there are only two propositions in the world, true and false.  The role of a proof is to show that a given proposition is (equal to) one of these two cases.  This is well and good, provided that you’re not interested in computing whether something is true or false.  As soon as you are, you must own up to a distinction between a general proposition, whose truth cannot be decided, and a Boolean, which can be computed one way or the other.  But then, as was mentioned earlier, we must link it up with a proposition expressing what the Boolean means, which is to specify its provenance.  And as soon as we do that, we realize that Booleans have no special status, and are usually a poor choice of data structure anyway because they lead to a condition that Dan Licata calls Boolean blindness.

A good example is the equality test, e=e’, of type bool.  As previously discussed, this seemingly innocent function presents serious complications, particularly in abstract languages with a rich type structure.  It also illustrates the standard confusion between a proposition and a Boolean.  As the type suggests, the equality test operation is a function that returns either true or false, according to whether its arguments are equal or not (in a sense determined by their type).  Although the notation invites confusion, the expression e=e’ is not a proposition stating that e and e’ are equal!  It is, rather, a piece of data of one of two forms, either true or false.  The equality test is a function that happens to have the property that if it returns true, then its arguments are equal (an associated equality proposition is true), and if it returns false, then its arguments are not equal (an associated equality proposition is false), and, moreover, is ensured to return either true or false for any inputs.  This may seem like a hair-splitting distinction, but it is not!  For consider, first of all, that the equality test may have been written with a different syntax, maybe e==e’, or (equal? e e’), or any number of other notations.  This makes clear, I hope, that the connection between the function and the associated proposition must be explicitly made by some other means; the function and the proposition are different things.  More interestingly, the same functionality can be expressed different ways.  For example, for partially ordered types we may write e<=e’ andalso e'<=e, or similar, satisfying the same specification as the innocent little “equals sign” function.  The connection is looser still here, and still looser in numerous other variations that you can spin for yourself.

What harm is there in making this confusion?  Perhaps the greatest harm is that it confuses a fundamental distinction, and this has all sorts of bad consequences.  For example, a student may reasonably ask, why is equality not defined for functions?  After all, in the proof I did the other day, I gave a carefully argument by structural induction that two functions are equal.  And then I turn around and claim that equality is not defined for functions!  What on earth am I talking about?  Well, if you draw the appropriate distinction, there is no issue.  Of course there is a well-defined concept of two functions being mathematically equal; it, in general, requires proof.  But there is no well-defined computable function that returns true iff two functions (on integers, say) are equal, and false otherwise.  Thus, = : (int->int) x (int->int) -> prop expresses equality, but = : (int->int)->(int->int)->bool cannot satisfy the specification just given.  Or, to put it another way, you cannot test equality of propositions!  (And rightly so.)

Another harm is the condition of Boolean blindness alluded to earlier.  Suppose that I evaluate the expression e=e’ to test whether e and e’ are equal.  I have in my hand a bit.  The bit itself has no intrinsic meaning; I must associate a provenance with that bit in order to give it meaning.  “This bit being true means that e and e’ are equal, whereas this other bit being false means that some other two expressions are not equal.”  Keeping track of this information (or attempting to recover it using any number of program analysis techniques) is notoriously difficult.  The only thing you can do with a bit is to branch on it, and pretty soon you’re lost in a thicket of if-the-else’s, and you lose track of what’s what.  Evolve the program a little, and you’re soon out to sea, and find yourself in need of sat solvers to figure out what the hell is going on.

But this is yet another example of an iatrogenic disorder!  The problem is computing the bit in the first place.  Having done so, you have blinded yourself by reducing the information you have at hand to a bit, and then trying to recover that information later by remembering the provenance of that bit.  An illustrative example was considered in my article on equality:

fun plus x y = if x=Z then y else S(plus (pred x) y)

Here we’ve crushed the information we have about x down to one bit, then branched on it, then were forced to recover the information we lost to justify the call to pred, which typically cannot recover the fact that its argument is non-zero and must check for it to be sure.  What a mess!  Instead, we should write

fun plus x y = case x of Z => y | S(x’) => S(plus x’ y).

No Boolean necessary, and the code is improved as well!  In particular, we obtain the predecessor en passant, and have no need to keep track of the provenance of any bit.

[Update: removed last paragraph.]


Don’t Mention Equality!

March 15, 2011

One thing I’ve learned in teaching CS is that what I choose not to say is as important as what I choose to say about a particular topic.  It requires a lot of care in some cases to retrain myself to avoid a certain line of thought that experience shows leads only to mistakes and misconceptions by the students.  A classic example is the vexed question of equality, which creates no end of difficulties for students, and invites all sorts of misunderstandings.  My advice: don’t mention equality!

Let me explain.  First off, in rudimentary languages, such as C, there is neither the motivation nor the possibility to avoid equality.  All data is first-order, and all expression types admit equality in constant time.  Equality of integers presents no problems, but it is often neglected that there are (according to the IEEE standard) two different meanings of equality of floats, one of which is not reflexive, other other of which is not transitive; name your poison.  Pointers must be comparable not only for equality, but also comparable to “NULL”, whatever that may mean.  (This is Tony Hoare’s self-described “billion dollar mistake”, the great mother of all iatrogenic disorders in programming languages.)  Although it’s a bit of a mess, no one thinks that equality is an issue in such languages.  Relatively speaking, it’s not.

But in abstract languages, such as ML, equality presents more fundamental problems.  Because, contrary to what everyone naively thinks, not all types even admit an equality test.  One obstacle is decidability: if you have functions, say, in your language, you can’t expect to compare them for equality.  Another is feasibility: if you have aggregates, such as trees or streams, in your language, you wouldn’t want to compare them for equality, even if you could in principle.  And most importantly, it doesn’t make sense: an equality test violates representation independence for abstract types, so you have no business relying on it in the first place.  Two equal (in the abstract sense) sets may well have distinct representations, or they may not.  The point of abstraction is to ensure that no client code can rely on whether they do or not.

So, if you want to bring up equality, you are buying into a huge mess.  But there is an alternative: just say no to equality.  More accurately, don’t say anything at all, don’t even bring it up.  And avoiding it has another benefit, namely you can avoid Booleans (the step-mother of all iatrogenic disorders in programming languages) entirely, and you will be better off.  (I’ll leave that discussion for another post.  Now, to head off pointless argument, let me say that eventually you’ll want to introduce equality, because you do sometimes need it, but only at a much, much later stage, when students have gotten safely past the real obstacles to learning to write good, clean code.  It’s no use rolling logs at them as well, let them spend their time on serious stuff instead.

Many uses of equality are better replaced by pattern matching.  If you have an inductively defined type, such as

datatype nat = Z | S of nat

then any function whose domain is nat should be defined by cases, not by comparison:

fun plus x y = case x of Z => y | S(x’) => S(plus x’ y)

Many students will want to write this instead:

fun plus x y = if x=Z then y else S (plus (pred x) y)

where pred has been defined previously, even if you never bring up equality yourself!  (It’s amazing what kids “learn” on the street.)  This has to be stamped out!  Not only does it require an additional function to obtain the predecessor, but the poor student has deprived himself of one of the most valuable tools in his repertoire, the exhaustiveness check for pattern matching.  (More subtly, and worse, the definition of pred must itself check for Z, even though we “know” that x cannot be Z at the call site.  I’ll return to this when discussing the confusion between booleans and propositions.)

But what if you’re programming with machine integers?   For example, how do we define the factorial function on the type int in ML?  A basic approach is to use the concept of an ordered type that is provided in the Standard ML library.  It comes down to this:

fun fact x = case compare (x, 0) of LSS => raise Undefined | EQL => 1 | GTR => x * fact (x-1)

You still need a predecessor here, but we can get rid of that with a little more effort:

datatype TRI = Neg of int | Zero | Pos of int
fun tri x = case compare (x,0) of LSS => Neg (x+1)  | EQL => Zero | GTR => Pos (x-1)
fun fact x = case tri x of Neg _ => raise Undefined | Zero => 1 | Pos of x’ => x * fact x’

When considered in isolation, this looks a bit heavy, but in practice it’s all a matter of designing the right interfaces to begin with, and using them to your advantage.

Applying the same principles to the option type makes clear why “null pointer analysis” is such a stupid idea.  Suppose that the variable x is of type t option for some type t.  Rather than write

if x=NONE then raise NullPtrException else …

which does not propagate the non-NONE status of x into the else branch, instead write the clearer case analysis

case x of NONE => raise NullPtrException | SOME x’ => …

wherein the second clause x’ is of type t, not t option, and hence cannot under any circumstances be NONE!


Teaching FP to Freshmen

March 15, 2011

This semester Dan Licata and I are co-teaching a new course on functional programming for first-year prospective CS majors.  This course is part of the new introductory CS curriculum at CMU, which includes a new course on imperative programming created by Frank Pfenning, and a planned new course on data structures and algorithms, which will be introduced by Guy Blelloch this fall.

The functional and imperative programming classes are independent of one another, and both are required for the new data structures class.  Both courses share an emphasis on verification techniques (principally, state invariants for imperative programming, and structural induction for functional programming).  The new data structures course emphasizes parallel algorithms as the general case, and places equal emphasis on persistent, as well as ephemeral, data structures.  This is achieved by using an expressive functional language (Standard ML) that includes both evaluation- and mutation-based computations, and that supports modularity and data abstraction.

Object-oriented programming is eliminated from the introductory curriculum, in favor of a more fundamental view of data structures using imperative programming, and a new course on parallel data structures using functional programming.  A new course on object-oriented design methodology will be offered at the sophomore level for those students who wish to study this topic.

Update: The revision of the undergraduate curriculum to which I allude here is described in the report “Introductory Computer Science Education at Carnegie Mellon: A Dean’s Perspective” by Randal E. Bryant, Klaus Sutner, and Mark J. Stehlik.  This report summarizes the recommendations of an ad hoc committee of faculty reporting to the faculty and dean of the School of Computer Science.

Update: Word-smithing.

Update: Clarification of data structures courses, and placement of OOP in the revised curriculum.


Getting Started

March 15, 2011

I plan to use this blog to publish thoughts and stimulate discussion about teaching introductory computer science.

Update: More generally, I plan to write about various topics in research and education in Computer Science.


Design a site like this with WordPress.com
Get started