Polarity in Type Theory

August 25, 2012

There has recently arisen some misguided claims about a supposed opposition between functional and object-oriented programming.  The claims amount to a belated recognition of a fundamental structure in type theory first elucidated by Jean-Marc Andreoli, and developed in depth by Jean-Yves Girard in the context of logic, and by Paul Blain-Levy and Noam Zeilberger in the context of programming languages.  In keeping with the general principle of computational trinitarianism, the concept of polarization has meaning in proof theory, category theory, and type theory, a sure sign of its fundamental importance.

Polarization is not an issue of language design, it is an issue of type structure.  The main idea is that types may be classified as being positive or negative, with the positive being characterized by their structure and the negative being characterized by their behavior.  In a sufficiently rich type system one may consider, and make effective use of, both positive and negative types.  There is nothing remarkable or revolutionary about this, and, truly, there is nothing really new about it, other than the terminology.  But through the efforts of the above-mentioned researchers, and others, we have learned quite a lot about the importance of polarization in logic, languages, and semantics.  I find it particularly remarkable that Andreoli’s work on proof search turned out to also be of deep significance for programming languages.  This connection was developed and extended by Zeilberger, on whose dissertation I am basing this post.

The simplest and most direct way to illustrate the ideas is to consider the product type, which corresponds to conjunction in logic.  There are two possible ways that one can formulate the rules for the product type that from the point of view of inhabitation are completely equivalent, but from the point of view of computation are quite distinct.  Let us first state them as rules of logic, then equip these rules with proof terms so that we may study their operational behavior.  For the time being I will refer to these as Method 1 and Method 2, but after we examine them more carefully, we will find more descriptive names for them.

Method 1 of defining conjunction is perhaps the most familiar.  It consists of this introduction rule

\displaystyle\frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and the following two elimination rules

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash A\;\textsf{true}}\qquad\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash B\;\textsf{true}}.

Method 2 of defining conjunction is only slightly different.  It consists of the same introduction

\displaystyle \frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and one elimination rule

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true} \quad \Gamma,A\;\textsf{true},B\;\textsf{true}\vdash C\;\textsf{true}}{\Gamma\vdash C\;\textsf{true}}.

From a logical point of view the two formulations are interchangeable in that the rules of the one are admissible with respect to the rules of the other, given the usual structural properties of entailment, specifically reflexivity and transitivity.  However, one can discern a difference in “attitude” in the two formulations that will turn out to be a manifestation of the concept of polarity.

Method 1 is a formulation of the idea that a proof of a conjunction is anything that behaves conjunctively, which means that it supports the two elimination rules given in the definition.  There is no commitment to the internal structure of a proof, nor to the details of how projection operates; as long as there are projections, then we are satisfied that the connective is indeed conjunction.  We may consider that the elimination rules define the connective, and that the introduction rule is derived from that requirement.  Equivalently we may think of the proofs of conjunction as being coinductively defined to be as large as possible, as long as the projections are available.  Zeilberger calls this the pragmatist interpretation, following Count Basie’s principle, “if it sounds good, it is good.”

Method 2 is a direct formulation of the idea that the proofs of a conjunction are inductively defined to be as small as possible, as long as the introduction rule is valid.  Specifically, the single introduction rule may be understood as defining the structure of the sole form of proof of a conjunction, and the single elimination rule expresses the induction, or recursion, principle associated with that viewpoint.  Specifically, to reason from the fact that A\wedge B\;\textsf{true} to derive C\;\textsf{true}, it is enough to reason from the data that went into the proof of the conjunction to derive C\;\textsf{true}.  We may consider that the introduction rule defines the connective, and that the elimination rule is derived from that definition.  Zeilberger calls this the verificationist interpretation.

These two perspectives may be clarified by introducing proof terms, and the associated notions of reduction that give rise to a dynamics of proofs.

When reformulated with explicit proofs, the rules of Method 1 are the familiar rules for ordered pairs:

\displaystyle\frac{\Gamma\vdash M:A\quad\Gamma\vdash N:B}{\Gamma\vdash \langle M, N\rangle:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{fst}(M):A}\qquad\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{snd}(M):B}.

The associated reduction rules specify that the elimination rules are post-inverse to the introduction rules:

\displaystyle\textsf{fst}(\langle M,N\rangle)\mapsto M \qquad \textsf{snd}(\langle M,N\rangle)\mapsto N.

In this formulation the proposition A\wedge B is often written A\times B, since it behaves like a Cartesian product of proofs.

When formulated with explicit proofs, Method 2 looks like this:

\displaystyle \frac{\Gamma\vdash M:A\quad\Gamma\vdash M:B}{\Gamma\vdash M\otimes N:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B \quad \Gamma,x:A,y:B\vdash N:C}{\Gamma\vdash \textsf{split}(M;x,y.N):C}

with the reduction rule

\displaystyle\textsf{split}(M\otimes N;x,y.P)\mapsto [M,N/x,y]P.

With this formulation it is natural to write A\wedge B as A\otimes B, since it behaves like a tensor product of proofs.

Since the two formulations of “conjunction” have different internal structure, we may consider them as two different connectives.  This may, at first, seem pointless, because it is easily seen that x:A\times B\vdash M:A\otimes B for some M and that x:A\otimes B\vdash N:A\times B for some N, so that the two connectives are logically equivalent, and hence interchangeable in any proof.  But there is nevertheless a reason to draw the distinction, namely that they have different dynamics.

It is easy to see why.  From the pragmatic perspective, since the projections act independently of one another, there is no reason to insist that the components of a pair be evaluated before they are used.  Quite possibly we may only ever project the first component, so why bother with the second?  From the verificationist perspective, however, we are pattern matching against the proof of the conjunction, and are demanding both components at once, so it makes sense to evaluate both components of a pair in anticipation of future pattern matching.  (Admittedly, in a structural type theory one may immediately drop one of the variables on the floor and never use it, but then why give it a name at all?  In a substructural type theory such as linear type theory, this is not a possibility, and the interpretation is forced.)  Thus, the verficationist formulation corresponds to eager evaluation of pairing, and the pragmatist formulation to lazy evaluation of pairing.

Having distinguished the two forms of conjunction by their operational behavior, it is immediately clear that both forms are useful, and are by no means opposed to one another.  This is why, for example, the concept of a lazy language makes no sense, rather one should instead speak of lazy types, which are perfectly useful, but by no means the only types one should ever consider.  Similarly, the concept of an object-oriented language seems misguided, because it emphasizes the pragmatist conception, to the exclusion of the verificationist, by insisting that everything be an object characterized by its methods.

More broadly, it is useful to classify types into two polarities, the positive and the negative, corresponding to the verificationist and pragmatist perspectives.  Positive types are inductively defined by their introduction forms; they correspond to colimits, or direct limits, in category theory.  Negative types are coinductively defined by their elimination forms; they correspond to limits, or inverse limits, in category theory.  The concept of polarity is intimately related to the concept of focusing, which in logic sharpens the concept of a cut-free proof and elucidates the distinction between synchronous and asynchronous connectives, and which in programming languages provides an elegant account of pattern matching, continuations, and effects.

As ever, enduring principles emerge from the interplay between proof theory, category theory, and type theory.  Such concepts are found in nature, and do not depend on cults of personality or the fads of the computer industry for their existence or importance.

Update: word-smithing.


для моих русских читателей

February 28, 2012

Владислав Натаров <[email protected]> has kindly translated my blog posts to date into Russian.  I cannot vouch for the accuracy of the translation, but I am grateful for the effort, and hope that this will be of interest to my Russian readers.


Words Matter

February 1, 2012

Yesterday, during a very nice presentation by Ohad Kammar at Carnegie Mellon, the discussion got derailed, in part, because of a standard, and completely needless, terminological confusion involving the word “variable”.  I’m foolish enough to try to correct it.

The problem is that we’ve all been taught to confuse variables with variables—that is, program variables with mathematical variables.  The distinction is basic.  Since time immemorial (well, at least since al Khwarizmi) we have had the notion of a variable, properly so-called, which is given meaning by substitution.  A variable is an unknown, or indeterminate, quantity that can be replaced by any value of its type (a type being, at least since Russell, the range of significance of a variable).  Frege gave the first systematic study of the quantifiers, and Church exploited the crucial concept of a variable to give the most sharply original and broadly applicable model of computation, the \lambda-calculus.

Since the dawn of Fortran something that is not a variable has come to be called a variable.  A program variable, in the sense of Fortran and every imperative language since, is not given meaning by substitution.  Rather, it is given meaning by (at least) two operations associated with it, one to get its contents and one to put new contents into it.  (And, maybe, an operation to form a reference to it, as in C or even Algol.)  Now as many of you know, I think that the concept of a program variable in this sense is by and large a bad idea, or at any rate not nearly as important as it has been made out to be in conventional (including object-oriented) languages, but that’s an argument for another occasion.

Instead, I’m making a plea.  Let’s continue to call variables variables.  It’s a perfectly good name, and refers to what is perhaps one of the greatest achievements of the human mind, the fundamental concept of algebra, the variable.  But let’s stop calling those other things variables!  In my Practical Foundations for Programming Languages I coined (as far as I know) a word that seems perfectly serviceable, namely an assignable.  The things called variables in imperative languages should, rather, be called assignables.  The word is only a tad longer than variable, and rolls off the tongue just as easily, and has the advantage of being an accurate description of what it really is.  What’s not to like?

Why bother?  For one thing, some languages have both concepts, a necessity if you want your language to be mathematically civilized (and you do).  For another, in the increasingly important world of program verification, the specification formalisms, being mathematical in nature, make use of variables, which most definitely are not assignables!  But the real reason to make the distinction is, after all, because words matter.  Two different things deserve to have two different names, and it only confuses matters to use the same word for both.  This week’s confusion was only one example of many that I have seen over the years.

So, my suggestion: let’s call variables variables, and let’s call those other things assignables.  In the fullnesss of time (i.e., once the scourge of imperative programming has been lifted) we may not need the distinction any longer.  But until then, why not draw the distinction properly?

Update: remove forward reference.


Of Course ML Has Monads!

May 1, 2011

A popular meme in the world of PL’s is that “Haskell has monads”, with the implication that this is a distinctive feature of the language, separate from all others.  While it is true that Haskell has popularized the use of monads as a program structuring device, the idea of a monad is not so much an issue of language design (apart from the ad hoc syntactic support provided by Haskell), but rather one of library design.  After all, a monad is just one of a zillion signatures (type classes) with which to structure programs, and there is no particular reason why this one cannot be used in any language that supports even a modicum of modularity.

Examined from the point of view of ML, monads are but a particular of use of modules.  The signature of monads is given by the definition

signature MONAD = sig
  type 'a monad
  val ret : 'a -> 'a monad
  val bnd : 'a monad -> ('a -> 'b monad) -> 'b monad
end

There are many, many, many structures that satisfy this signature; I needn’t (and, in any case, can’t) rehearse them all here.  One particularly simple example should suffice to give the general idea:

structure Option : MONAD = struct
  type 'a monad = 'a option
  fun ret x = SOME x
  fun bnd (SOME x) k = k x
    | bnd NONE k = NONE
end

This is of course the option monad, which is sometimes used to model the data flow aspects of exceptions, perhaps with some elaboration of the NONE case to associate an exceptional value with a non-result.  (The control flow aspects are not properly modeled this way, however.  For that one needs, in addition, access to some sort of jump mechanism.)

Examples like this one proliferate.  A monad is represented by a structure.  Any structure that provides the facilities specified by the MONAD signature gives rise to the characteristic sequentialization mechanisms codified by it.  Monad transformers are functors that transform one monad into another, with no fuss or bother, and no ad hoc mechanisms required.  Standard modular programming techniques suffice to represent monads; moreover, the techniques involved are fully general, and are equally applicable to other signatures of interest (arrows, or quivers, or bows, or what have you).  Moreover, it is shown in my paper with Chakravarty, Dreyer, and Keller how to integrate modules into the type inference mechanism of ML so that one can get automatic functor instantiation in those limited cases where it is self-evident what is intended.  This has been implemented by Karl Crary in a prototype compiler for an extension of Standard ML, and it would be good to see this supported in more broadly available compilers for the language.

The bulk of the mania about monads is therefore accounted for by modules.  I have no doubt, however, that you are wondering about the IO monad in Haskell.  Isn’t that a fundamental feature of the language that cannot be replicated in ML?  Hardly!  It’s entirely a matter of designing the signatures of the standard basis library modules, and nothing more.  The default basis library does not attempt to segregate effects into a monad, but it is perfectly straightforward to do this yourself, by providing your own layer over the standard basis, or to reorganize the standard basis to enforce the separation.  For example, the signature of reference cells might look like this:

signature REF = sig
  type 'a ref
  val ref : 'a -> 'a ref IO.monad
  val ! : 'a ref -> 'a IO.monad
  val := : 'a ref -> 'a -> unit IO.monad
end

Here we are presuming that we have a fixed declaration

structure IO : MONAD = ...

that packages up the basic IO primitives that are already implemented in the run-time system of ML, more or less like in Haskell.  The other signatures, such as those for mutable arrays or for performing input and output, would be modified in a similar manner to push effects into the IO monad.  Et voila, you have monadic effects.

There’s really nothing to it.  In fact, the whole exercise was carried out by a Carnegie Mellon student, Phillippe Ajoux, a couple of years ago.  He also wrote a number of programs in this style just to see how it all goes: swimmingly.  He also devised syntactic extensions to the Moscow ML compiler that provide a nicer notation for programming with monads, much as in Haskell, but better aligned with ML’s conventions.  (Ideally it should be possible to provide syntactic support for any signature, not just monads, but I’m not aware of a worked-out design for the general case, involving as it would an intermixing of parsing and elaboration.)

My point is that the ML module system can be deployed by you to impose the sorts of effect segregation imposed on you by default in Haskell.  There is nothing special about Haskell that makes this possible, and nothing special about ML that inhibits it.  It’s all a mode of use of modules.

So why don’t we do this by default?  Because it’s not such a great idea.  Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible.  Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming.  You cannot easily convert between functional and monadic style without a radical restructuring of code.  And you are deprived of the useful concept of a benign effect.

The moral of the story is that of course ML “has monads”, just like Haskell.  Whether you want to use them is up to you; they are just as useful, and just as annoying, in ML as they are in Haskell.  But they are not forced on you by the language designers!

Update: This post should’ve been called “ML Has Monads, Why Not?”, or “Of Course ML Has Comonads!”, but then no one was wondering about that.

Update: I now think that the last sentence is excessive.  My main point is simply that it’s very simple to go one way or the other with effects, if you have modules to structure things; it’s all a matter of library design.  A variant of ML that enforced the separation of effects is very easily constructed; the question is whether it is useful or not.  I’ve suggested that the monadic separation is beguiling, but not clearly a great idea.  Alternatively, one can say that we’re not that far away from eliminating laziness from Haskell, at least in this respect: just re-do the standard basis library in ML, and you’re a good ways there.  Plus you have modules, and we understand how to integrate type classes with modules, so the gap is rather small.

Update: Removed inaccurate remarks about unsafePerformIO.


The Point of Laziness

April 24, 2011

As I’ve discussed previously, there are a number of good reasons why Haskell is not suitable for teaching introductory functional programming.  Chief among these is laziness, which in the context of a pure functional language has fatal side effects.  First, Haskell suffers from a paucity of types.  It is not possible in Haskell to define the type of natural numbers, nor the type of lists of natural numbers (or lists of anything else), nor any other inductive type!  (In Carollian style there are types called naturals and lists, but that’s only what they’re called, it’s not what they are.)  Second, the language has problematic cost model.  It is monumentally difficult to reason about the time, and especially space, usage of a Haskell program.  Worse, parallelism arises naturally in an eager, not a lazy, language—for example, computing every element of a finite sequence is fundamental to parallel computing, yet is not compatible with the ideology of laziness, which specifies that we should only compute those elements that are required later.

The arguments in favor of laziness never seem convincing to me.  One claim is that the equational theory of lazy programs is said to be more convenient; for example, beta reduction holds without restriction.  But this is significant only insofar as you ignore the other types in the language.  As Andrzej Filinski pointed out decades ago, whereas lazy languages have products, but not sums, eager languages have sums, but not products.  Take your pick.  Similarly, where lazy languages rely on strictness conditions, eager languages rely on totality conditions.  The costs and benefits are dual, and there seems to be no reason to insist a priori on one set of equations as being more important than the other.

Another claim is that laziness supports the definition of infinite data types, such as infinite sequences of values of some type.  But laziness is not essential, or even particularly useful, for this purpose.  For example, the type nat->nat is a natural representation of infinite sequences of natural numbers that supports many, but not all, of the operations that finite sequences (but not, for example, operations such a reverse, which make no sense in the infinite case).   More generally, there is no inherent connection between laziness and such infinitary types.  Noam Zeilberger has developed an elegant theory of eager and lazy types based on distinguishing positive from negative polarities of type constructors, the positive including the inductive and the negative including the coinductive.   Coinductive types are no more about laziness than inductive types are about pointers.

I wish to argue that laziness is important, but not for pure functional programming, but rather only in conjunction with effects.  This is the Kahn-MacQueen Principle introduced in the 1970’s by Gilles Kahn and David MacQueen in their seminal paper on recursive networks of stream transducers.  Dan Licata and I have emphasized this viewpoint in our lectures on laziness in our new course on functional programming for freshmen.

Let’s use streams as a motivating example, contrasting them with lists, with which they are confused in lazy languages.  A list is an example of a positive type, one that is defined by its membership conditions (constructors).  Defining a function on a list amounts to pattern matching, giving one case for each constructor (nil and cons), and using recursion to apply the function to the tail of the list.  A stream is an example of a negative type, one that is defined by its behavioral conditions (destructors).  Defining a stream amounts to defining how it behaves when its head and tail are computed.  The crucial thing about lists, or any positive type, is that they are colimits; we know as part of their semantics how a value of list type are constructed.  The crucial thing about streams, or any negative type, is that they are limits; we know as part of their semantics how they behave when destructed.

Since we have no access to the “inside” of a stream, we should think of it not as a static data structure, but as a dynamic process that produces, upon request, successive elements of the stream.  Internally, the stream keeps track of whatever is necessary to determine successive outputs; it has its own state that is not otherwise visible from the outside.  But if a stream is to be thought of as given by a process of generation, then it is inherently an ephemeral data structure.  Interacting with a stream changes its state; the “old” stream is lost when the “new” stream is created.  But, as we have discussed previously, ephemeral data structures are of limited utility.  The role of memoization is to transform an ephemeral process into a persistent data structure by recording the successive values produced by the process so that they can be “replayed” as necessary to permit the stream to have multiple futures.  Thus, rather than being a matter of efficiency, memoization is a matter of functionality, providing a persistent interface to an underlying ephemeral process.

To see how this works in practice, let’s review the signatures PROCESS and STREAM that Dan Licata and I developed for our class.  Here’s a snippet of the signature of processes:

signature PROCESS = sig
  type 'a process = unit -> 'a option
  val stdin : char process
  val random : real process
end

A process is a function that, when applied, generates a value of some type, or indicates that it is finished.  The process stdin represents the Unix standard input; the process random is a random number generator.  The signature of streams looks essentially like this:

signature STREAM = sig
  type 'a stream
  datatype 'a front = Nil | Cons of 'a * 'a stream
  val expose : 'a stream -> 'a front
  val memo : 'a Process.process -> 'a stream
  val fix : ('a stream -> 'a stream) -> 'a stream
end

The type ‘a front is the type of values that arise when a stream is exposed; it can either terminate, or present an element and another stream.  The memo constructor creates a persistent stream from an ephemeral process of creation for its elements.  The fix operation is used to create recursive networks of streams.  There are other operations as well, but these illustrate the essence of the abstraction.

Using these signatures as a basis, it is extremely easy to put together a package of routines for scripting.  The fundamental components are processes that generate the elements of a stream.  Combinators on streams, such as composition or mapping and reducing, are readily definable, and may be deployed to build up higher levels of abstraction.  For example, Unix utilities, such as grep, are stream transducers that take streams as inputs and produce streams as outputs.  These utilities do not perform input/output; they merely transform streams.  Moreover, since streams are persistent, there is never any issue with “buffering” or “lookahead” or “backtracking”; you just manipulate the stream like any other (persistent) data structure, and everything works automagically.  The classical Bell Labs style of intermixing I/O with processing is eliminated, leading not only to cleaner code, but also greater flexibility and re-use.  This is achieved not by the double-backflips required by the inheritance mechanisms of oopl’s, but rather by making a crisp semantic distinction between the processing of streams and the streaming of processes.  True reuse operates at the level of abstractions, not at the level of the code that gives rise to them.

Update: It seems worthwhile to point out that memoization to create a persistent from an ephemeral data structure is a premier example of a benign effect, the use of state to evince functional behavior.  But benign effects are not programmable in Haskell, because of the segregation of effects into the IO monad.

Update: Lennart Augustsson gives his reasons for liking laziness.

Update: word smithing.


Modules Matter Most

April 16, 2011

When it comes to controlling the complexity of developing and, more importantly, maintaining a large system, the only game in town is modularity.  And as even the strongest proponents of unityped languages have come to accept, modularity is all about types (static types, of course, there being no other kind).  A decomposition of a system into modules consists of an application of the structural principle of substitution (transitivity of entailment, composition of maps) that is fundamental to the very conception of a type system:

\displaystyle{{\Gamma\vdash M : A \qquad \Gamma,x:A \vdash N:B}\over{\Gamma\vdash [M/x]N:B}}

In pedestrian terms the type A is the “header file” describing M, and N is the new code implementing specification B in terms of an unspecified implementation of the functionality specified by A.  The interaction between M and N is mediated entirely by the type A; access to the source code of M is denied to the client, precisely so that the dependence between the components is weakened in anticipation of future changes to M, or to allow for flexibility in development (M need not even exist for the development of N to proceed, as long as the type A is specified).

To be most useful, it is important that the relationship between M and A be many-to-many, and not many-to-one or one-to-one.  Think of A as the API for a mapping between keys and values.  It is absolutely essential that the language admit that many different modules M be of type A, and it is absolutely essential that a given module M satisfy many distinct types A, without prior arrangement.  The type A is purely descriptive of the extensional behavior of a module of that type, and cannot be regarded as pinning down any particular implementation of that behavior.  Moreover, a given module may well support a variety of views, for example by “forgetting” certain aspects of it in particular contexts.  For example, one may neglect that an implementation of mapping supports deletion in a setting where only extension and application are required.

This is all pretty basic, but what surprises me is how few languages support it cleanly and simply.  One particularly awkward method is to rely on extra-linguistic “tools” that manipulate build parameters to switch choices of M for a given N, quickly resulting in an ad hoc language of its own just to manage the build scripts.  Another technique is to manipulate or restrict inheritance so that some degree of modularity can be simulated, much as one can bang in nails using a pair of pliers.  A common methodology, in fact, attempts to cut down inheritance to provide what we had in ML in the early 1980’s (functors), obviating the intervening decades of maladaptations of bad ideas.

More disappointingly, for me at least, is that even relatively enlightened languages, such as Haskell or F#, fail to support this basic machinery.  In Haskell you have type classes, which are unaccountably popular (perhaps because it’s the first thing many people learn).  There are two fundamental problems with type classes.  The first is that they insist that a type can implement a type class in exactly one way.  For example, according to the philosophy of type classes, the integers can be ordered in precisely one way (the usual ordering), but obviously there are many orderings (say, by divisibility) of interest.  The second is that they confound two separate issues: specifying how a type implements a type class and specifying when such a specification should be used during type inference.  As a consequence, using type classes is, in Greg Morrisett’s term, like steering the Queen Mary: you have to get this hulking mass pointed in the right direction so that the inference mechanism resolves things the way you want it to.  In F# the designers started with the right thing (Caml) and eliminated the very thing that matters the most about ML, it’s module system!  Instead the F# designers added a bunch of object-oriented concepts (for the sake of compatibility with .net and with the mindset of MS developers), and tricked up the language with features that are more readily, and flexibly, provided by the module system.

Why bring this up?  Apart from general grousing, my point is that we had little choice in what language to use in teaching our students.  Modularity matters most, and we must have a language that supports flexible modularity in the form I am describing here.  When we examined our options, which we did very carefully, the only contenders are Standard ML and O’Caml.  We could have gone with either, but were persuaded to use Standard ML, which has worked beautifully for our purposes.  The decisive factor in choosing between the two ML’s was simply that we have a prior code base in Standard ML on which to draw, and there are two implementations of Standard ML that support parallelism (MLton and Poly/ML), albeit neither optimally for our purposes.  Haskell provides better support for parallelism (by undoing its unfortunate commitment to laziness, which results in an unusable cost model for both time and, especially, space), but wasn’t suitable because of its lack of support for modularity.

As I have mentioned, our plan is to re-do the introductory curriculum in Computer Science to modernize it and to place better emphasis on principles rather than current technologies.  One aspect of this is to re-do the standard Data Structures and Algorithms course to eliminate the over-emphasis on ephemeral data structures, and to treat parallel algorithms as the general case that encompasses the increasingly irrelevant case of one processor.  (Yes, this is a technological trend, but it is more importantly a conceptual change that emerges from focusing on language, rather than machine, models of computation.)  What is a data structure?  It’s a signature, or interface, written in the language you’re programming in.  What’s an algorithm?  It’s a structure, or implementation, of that signature.  A signature, such as that for a mapping, can be implemented in various ways, with different cost trade-offs (logarithmic lookup vs. constant lookup, for example).  A given algorithm, such as a balanced tree, can implement many different data structures, such as mappings or sets.  The distinction between a peristent and an ephemeral mapping is a difference of data structure, that is, of signature.  The demands are different, the algorithms are different.  We should be able to support both forms as easily and cleanly as the other, to be able to compare them, and to explain, for example, why the ephemeral case is of limited utility.  It is not too much to ask to be able to write these examples as running code, with a minimum of fuss or bother!

We have for decades struggled with using object-oriented languages, such as Java or C++, to explain these simple ideas, and have consistently failed.  And I can tell those of you who are not plugged into academics at the moment, many of my colleagues world-wide are in the same situation, and are desperate to find a way out.  The awkward methodology, the “design patterns”, the “style guidelines”, all get in the way of teaching the principles.  And even setting that aside, you’re still doing imperative programming on ephemeral data structures.  It just does not work, because it is fundamentally the wrong thing.  Just try to teach, say, binary search tree delete; it’s a horrific mess!  You wind up with absurd “null pointer” nonsense, and a complex mess caused by the methodology, not the problem itself.  Pretty soon you have to resort to “frameworks” and “tools” just to give the students a fighting chance to get anything done at all, distancing them from the essential ideas and giving the impression that programming is painful and ugly, an enormous tragedy.

Our solution is to abandon it entirely, pushing OO techniques to a more advanced level for students who wish to learn them, and concentrating on the basic principles of modularity, parallel and sequential cost analysis, and direct verification of running code at the introductory level.  Dijkstra used to say “beauty is our business”, to which I would add that life is too short, and bright minds too precious, to waste on ugly things.  And if you take this point of view seriously, the best and the brightest are drawn to, rather than repulsed by, the field.  Pace some of my colleagues at a Major Institute of Technology, students absolutely do love to program and do love the beauty of code, provided that the code you ask them to write is, in fact, beautiful.  There is little more dreary than the corporate bureaucracy of OOP, and few things more lovely than the mathematical elegance of FP.

[Update: word-smithing, again]


Persistence of Memory

April 9, 2011

Salvador Dali’s masterpiece, The Persistence of Memory, is one of the most recognizable works of art in the world.  The critic Dawn Ades described it as “a Surrealist meditation on the collapse of our notions of a fixed cosmic order” induced by Einstein’s penetrating analysis of the concepts of time and space in the physical world.  Just as Dali’s Persistence of Memory demarcated the transition from age-old conceptions of time and space in physics, so does the computational concept of persistence of memory mark the transition from sequential time and mutable space to parallel time and immutable space.

A short while ago I described the distinction between parallelism and concurrency in terms of a cost model that assigns a parallel, as well as sequential, time complexity to a program.  Parallelism is all about efficiency, not semantics; the meaning of a program is not affected by whether it is executed on one processor or many.  Functional languages expose parallelism by limiting sequential dependencies among the parts of a computation; imperative languages introduce inessential dependencies that impede parallelism.

Another critical ingredient for parallelism is the concept of a persistent data structure, one for which its associated operations transform, rather than alter, it.  A persistent dictionary, for example, has the characteristic that inserting an element results in a new dictionary with an expanded domain; the old dictionary persists, and is still available for further transformation.  When calculating the sum of 2 and 2, resulting in 4, no one imagines that the 2’s are “used up” in the process!  Nor does one worry whether the sum of 1 and 3 is the “same” 4 or a “different” 4!  The very question is absurd (or, more precisely, trivial).  So why do we worry about whether the result of inserting 2 into dict “uses up” the old dict?  And why do we worry about whether inserting 2 into the empty dictionary twice results in the “same” dictionary or a “different” one?

Yet both academic and practical computing has all but confined itself to ephemeral data structures, which exhibit precisely such behavior.  Operations on an ephemeral data structure “use up” the data structure, making it unavailable for further use without going to some trouble to get it back.  The pathologies resulting from this abound.  Standard compiler texts, for example, devote a chapter to concepts like “block structured symbol tables”, which are, in fact, nothing more than persistent dictionaries done the hard way.  More generally, whenever a data structure has multiple futures, such as when backtracking or exploiting parallelism, ephemeral data structures get in the way.  Indeed,  the bulk of object-oriented programming, with its absurd over-emphasis on the “message passing” metaphor, stress the alteration of objects as the central organizing principle, confounding parallelism and complicating simple algorithms.

A prime virtue of functional languages is that persistence is the default case, but they can as readily support ephemeral data structures as any imperative (including object-oriented) language.  All functional languages include types of mutable cells and mutable arrays, and provide support for conventional, sequential, imperative programming with semicolons and even curly braces!  (Some do this better than others; Haskell is, in my view, the world’s best imperative programming language, and second-best functional language, but that’s a subject for another post.)  But why would you want to? Why deprive yourself of the benefits of persistence, and insist instead on an ephemeral data structure?

This question came up recently in our planning for the Functional Programming class that we are teaching this semester for freshmen at Carnegie Mellon.  All semester we have been using functional programming techniques to build clean, verifiable, modular, parallel programs.  The students routinely prove theorems about their code, structure programs using abstract types, and exploit parallelism to improve the asymptotic performance of their programs.  Recent homework assignments include the implementation of the parallel Barnes-Hut algorithm for the n-body problem in physics, and the parallel Jamboree algorithm for game-tree search in perfect information games.  Persistent data structures are the key to making this possible; just try to code Barnes-Hut in an imperative language, and you will find yourself in a morass worrying about concurrency when you should instead be envisioning a recursive tree decomposition of space, and the computation of forces using formulas from high school physics.

We tried hard to find good motivations for using an ephemeral data structure when you can just as easily (actually, much more easily) use a persistent one.  As we went through them, we realized that all of the standard arguments are questionable or false.  The usual one is some vague notion of “efficiency” in either time or space.  While I concede that one can, in principle, solve a particular, confined problem more efficiently by doing absolutely everything by hand (memory management, scheduling, arithmetic), in the overwhelming majority of cases the demands of evolution of code far outweigh the potential advantages of doing everything by hand.  Modularity matters most when it comes to building and evolving large systems; functional languages, with persistent data structures, support modularity the best.  (I’ll have more to say about this in a future post.)

Most of the arguments about efficiency, though, ignore questions of functionality.  It is senseless to compare the “efficiency” of one data structure that provides different functionality than another.  A persistent data structure does more for you than does an ephemeral one.  It allows you to have multiple futures, including those that evolve in parallel with one another.  It makes no sense to insist that some ephemeral approximation of such a data structure is “more efficient” if it does not provide those capabilities!  And making it do so is, invariably, a bitch!  Conventional ephemeral data structures are not readily parallelizable; it’s often a publishable result to get a decent degree of parallelism using imperative methods.  By contrast, even freshmen (admittedly, Carnegie Mellon freshmen) can implement a parallel game tree search or a tree-based solution to the n-body problem in a one-week homework assignment.

So what’s the deal?  Why should we care about ephemeral data structures at all?  I have no idea.  Mostly, it’s a legacy cost imposed on us by the overzealous emphasis on imperative methods and machine-based, rather than language-based, models of computation.  But this will change.  Starting this fall, introductory data structures and algorithms will be liberated from the limitations of imperative, object-oriented programming, and will instead stress persistent (as well as ephemeral) data structures, and parallel (including as a special case sequential) algorithms.  The future of computing depends on parallelism (for efficiency), distribution (for scale), and verification (for quality).  Only functional languages support all three naturally and conveniently; the old ones just don’t cut it.

Update: Here’s a chart summarizing the situation as I see it:

\displaystyle  \begin{array}{|c|c|c|}  \hline  & \text{Ephemeral} & \text{Persistent} \\  \hline  \text{Sequential} & \textit{imperative} & \textit{benign effects} \\  \hline  \text{Parallel} & \textit{hard to get right} & \textit{functional} \\  \hline  \end{array}

Conventional imperative programming works well for the ephemeral, sequential case; it is notoriously hard to use imperative methods for parallelism.  Benign effects, as exemplified by self-adjusting data structures, can be used to give rise to persistent behavior in the sequential setting, but the use of effects impedes parallelism.


Dynamic Languages are Static Languages

March 19, 2011

While reviewing some of the comments on my post about parallelism and concurrency, I noticed that the great fallacy about dynamic and static languages continues to hold people in its thrall.  So, in the same “everything you know is wrong” spirit, let me try to set this straight: a dynamic language is a straightjacketed static language that affords less rather than more expressiveness.   If you’re one of the lucky ones who already understands this, congratulations, you probably went to Carnegie Mellon!  For those who don’t, or think that I’m wrong, well let’s have at it.  I’m not going to be very technical in this post; the full technical details are available in my book, Practical Foundations for Programming Languages, which is available online.

So-called dynamic languages (“so-called” because I’m going to argue that they don’t exist as a separate class of languages) are perennially popular.  From what I can tell, it’s largely a matter of marketing.  Hey, we all want to be dynamic, don’t we?  A dynamic personality is one that makes things happen, takes risks, has fun, breaks the rules!  And who wants to be static?  It’s synonymous with boring, after all, and no one wants to be a bore (even if they are, or especially if they are).  So, hey, dynamic languages are cool, right?  Or, at any rate, cool people like dynamic languages.  Static languages are for drips and losers.

That’s one argument, and, to be honest, it’s quite a successful bit of marketing.  Like most marketing tactics, it’s designed to confuse, rather than enlighten, and to create an impression, rather than inform.

Another part of the appeal of dynamic languages appears to be that they have acquired an aura of subversion.  Dynamic languages fight against the tyranny of static languages, hooray for us!  We’re the heroic vanguard defending freedom from the tyranny of typing!  We’re real programmers, we don’t need no stinking type system!

Now I’m as susceptible to the seductions of subversion as the next guy (witness this very post), so I can appreciate wanting to strike one for the little guy, overturning the evil establishment.  But I’ve got news for you: when it comes to “dynamic typing”, it’s a futile enterprise.  Not because the establishment is so powerful.  Not because evil always wins.  But because the distinction makes no sense.  Dynamic typing is but a special case of static typing, one that limits, rather than liberates, one that shuts down opportunities, rather than opening up new vistas.  Need I say it?  Something can hardly be opposed to that of which it is but a trivial special case.  So, give it up, and get with the program!  There are ill-defined languages, and there are well-defined languages.  Well-defined languages are statically typed, and languages with rich static type systems subsume dynamic languages as a corner case of narrow, but significant, interest.

Wittgenstein said that all philosophical problems are problems of grammar.  And so it is here as well.  The root of the problem lies in the confusion between a type and a class.  We all recognize that it is often very useful to have multiple classes of values of the same type.  The prototypical example is provided by the complex numbers.  There is one type of complex numbers that represent points in two-dimensional space.  In school we encounter two classes of complex numbers, the rectangular and the polar.  That is, there are two ways to present a complex number using two different systems of coordinates.  They are, of course, interchangeable, because they represent values of the same type.  But the form of presentation differs, and some forms are more convenient than others (rectangular for addition, polar for multiplication, for example).  We could, of course, choose a “coin of the realm”, and represent all complex numbers in one way, and consider coordinates as just a matter of how a number is given.  But it can also be convenient to consider that the type of complex numbers consists of two classes, each of which consists of two real numbers, but interpreted differently according to the coordinate system we are using.

Crucially, the distinction between the two classes of complex number is dynamic in that a given computation may result in a number of either class, according to convenience or convention.  A program may test whether a complex number is in polar or rectangular form, and we can form data structures such as sets of complex numbers in which individual elements can be of either form.  But this does not conflict with the basic fact that there is but one static type of complex numbers!  In type theoretic terms what I am saying is that the type complex is defined to be the sum of two copies of the product of the reals with itself.  One copy represents the class of rectangular representations, the other represents the class of polar representations.  Being a sum type, we can “dispatch” (that is, case analyze) on the class of the value of the type complex, and decide what to do at run-time.  This is no big deal.  In fact, it’s quite simple and natural in languages such as ML or Haskell that support sum types.  (Languages that don’t are hobbled, and this is part of the confusion.)

What does this have to do with the concept of a dynamic language?  The characteristics of a dynamic language are the same, values are classified by into a variety of forms that can be distinguished at run-time.  A collection of values can be of a variety of classes, and we can sort out at run-time which is which and how to handle each form of value.  Since every value in a dynamic language is classified in this manner, what we are doing is agglomerating all of the values of the language into a single, gigantic (perhaps even extensible) type.  To borrow an apt description from Dana Scott, the so-called untyped (that is “dynamically typed”) languages are, in fact, unityped.  Rather than have a variety of types from which to choose, there is but one!

And this is precisely what is wrong with dynamically typed languages: rather than affording the freedom to ignore types, they instead impose the bondage of restricting attention to a single type!  Every single value has to be a value of that type, you have no choice!  Even if in a particular situation we are absolutely certain that a particular value is, say, an integer, we have no choice but to regard it as a value of the “one true type” that is classified, not typed, as an integer.  Conceptually, this is just rubbish, but it has serious, tangible penalties.  For one, you are depriving yourself of the ability to state and enforce the invariant that the value at a particular program point must be an integer.  For another, you are imposing a serious bit of run-time overhead to represent the class itself (a tag of some sort) and to check and remove and apply the class tag on the value each time it is used.  (See PFPL for full details of what is involved.)

Now I am fully aware that “the compiler can optimize this away”, at least in some cases, but to achieve this requires one of two things (apart from unreachable levels of ingenuity that can easily, and more profitably, be expressed by the programmer in the first place).  Either you give up on modular development, and rely on whole program analysis (including all libraries, shared code, the works), or you introduce a static type system precisely for the purpose of recording inter-modular dependencies.  The whole-program approach does not scale, and flies in the face of the very advantage that dynamic languages supposedly have, namely dynamic evolution and development of components.  The static type system approach works beautifully, and makes my point nicely.

I am also fully aware that many statically typed languages, particularly the ones widely in commercial use, do not have a sufficiently expressive type system to make it feasible to work dynamically (that is, with multiple classes of values) within a statically typed framework.  This is an argument against bad languages, not an argument against type systems.  The goal of (static, there is no other kind) type systems is precisely to provide the expressive power to capture all computational phenonema, including dynamic dispatch and dynamically extensible classification, within the rich framework of a static type discipline.  More “researchy” languages such as ML or Haskell, have no trouble handling dynamic modes of programming.

Why should you care?  Because, I argue, that a fully expressive language is one that supports the delicate interplay between static and dynamic techniques.  Languages that force only one perspective on you, namely the dynamic languages, hobble you; languages that admit both modes of reasoning enable you and liberate you from the tyranny of a single type.  Let a thousand flowers bloom!


Parallelism Is Not Concurrency

March 17, 2011

In an earlier post I mentioned that one goal of the new introductory curriculum at Carnegie Mellon is to teach parallelism as the general case of computing, rather than an esoteric, specialized subject for advanced students.  Many people are incredulous when I tell them this, because it immediately conjures in their mind the myriad complexities of concurrency: locks, monitors, deadlock, dining philosophers, ….  And if you look at the recent research literature, you will see countless papers of the form (a) parallelism is important, so (b) let’s talk about concurrency.  The situation has gotten so bad that I think that most computer scientists cannot distinguish the two concepts.  (Or maybe they willfully confuse the two so as to justify funding for work on concurrency by appealing to the importance of parallelism.)

Given this state of affairs, I cannot explain what we are doing at Carnegie Mellon (to anyone who doesn’t already know) without first rebooting you.  Trust me, I’m just going to hit Ctrl-Alt-Delete, and then click Restart.  OK, we’re back.

The first thing to understand is parallelism has nothing to do with concurrency.  Concurrency is concerned with nondeterministic composition of programs (or their components).  Parallelism is concerned with asymptotic efficiency of programs with deterministic behavior.  Concurrency is all about managing the unmanageable: events arrive for reasons beyond our control, and we must respond to them.  A user clicks a mouse, the window manager must respond, even though the display is demanding attention.  Such situations are inherently nondeterministic, but we also employ pro forma nondeterminism in a deterministic setting by pretending that components signal events in an arbitrary order, and that we must respond to them as they arise.  Nondeterministic composition is a powerful program structuring idea.  Parallelism, on the other hand, is all about dependencies among the subcomputations of a deterministic computation.  The result is not in doubt, but there are many means of achieving it, some more efficient than others.  We wish to exploit those opportunities to our advantage.

Now I can hear you object, but isn’t concurrency required to implement parallelism?  Well, yes, it is, but concurrency is also required to implement sequentiality too!  The timing signal on your processor chip is essentially a synchronization mechanism with which to coordinate the otherwise independent activity of the components of the processor.  (Yes, I know that there are asynchronous processors, but this only makes my point stronger, you still need to control the concurrency.)  The point is that concurrency is not relevant to parallelism, even if the engineers who build our parallel computing platforms must deal with concurrency.  Another way to say the same thing is that parallelism is a useful abstraction, and abstractions should never be confused with their implementations.  (Numbers are not certain sets, but they can be implemented in terms of sets.  And so on.)

All well and good, parallelism is not concurrency, but what is it?  What is an appropriate model for parallel computation?  As in the sequential case, there are two answers, a machine-based approach and a language-based approach.  I am going to criticize the machine-based formulation, and argue for the language-based formulation on two grounds: (a) it is inherently more elegant and natural, and (b) it can be taught directly to students without any fuss or bother.  Much of what I have to say is derived from the seminal and beautiful work of Guy Blelloch, which I have absorbed over years of collaboration.  To the extent I’ve gotten it right, give Guy the credit; to the extent that I’ve gotten it wrong, assign me the blame.

The canonical machine-based model is the PRAM, or parallel RAM, which generalizes the classical RAM model to allow p>0 processors that share a common memory.  There are many variations on the PRAM, according to what sorts of interactions are allowed and how conflicts are resolved among the processors.  This is not important for present purposes, so I will not attempt to categorize the variations here.  The key idea is that the number, p, of processors is fixed in advance—results are stated in terms of how efficiently an algorithm executes on a p-processor PRAM as a function of the size, n, of the input.  This is well and good as a cost model for parallel execution, but it totally sucks as a programming model!  To write code for a PRAM, you fix the number of processors, and write the code for that number of processors (often the various cases are “analogous”, but in reality you must fix p and write code for that fixed p).  Moreover, you must (officially, anyway) write low-level imperative RAM code, which introduces spurious dependencies among sub-computations by explicitly scheduling the steps of the computation.  This is horrendous!  Of course typically one writes pidgin C, and sketches how it is compiled, but the fact is you’re still writing sequential, imperative code to implement a parallel algorithm.

As I’ve argued previously, what is needed is a language-based model of computation in which we assign costs to the steps of the program we actually write, not the one it (allegedly) compiles into.  Moreover, in the parallel setting we wish to think in terms of dependencies among computations, rather than the exact order in which they are to be executed.  This allows us to factor out the properties of the target platform, such as the number, p, of processing units available, and instead write the program in an intrinsically parallel manner, and let the compiler and run-time system (that is, the semantics of the language) sort out how to schedule it onto a parallel fabric.  Here’s a useful analogy.  Just as abstract languages allow us to think in terms of data structures such as trees or lists as forms of value and not bother about how to “schedule” the data structure into a sequence of words in memory, so a parallel language should allow us to think in terms of the dependencies among the phases of a large computation and not bother about how to schedule the workload onto processors.  Storage management is to abstract values as scheduling is to deterministic parallelism.

The paradigmatic example is provided by Quicksort, a naturally parallel algorithm.  Many people think that the “big idea” of Quicksort is the in-place swapping technique that Hoare invented in the original formulation.  I argue that this is a detail; the “big idea” is that it is parallelizable.  To see this, we must work at a reasonable level of abstraction: a sort function permutes a non-empty sequence of (comparable) elements to obtain another sequence in sorted order.

fun qs xs = if |xs|=1 then xs else let val (xsl, xsg) = split xs in concat (qs xsl, qs xsg)

That is, we split the sequence into two parts (by some means) such that everything in the left part is less than everything in the right part, then sort the parts independently (in parallel!), and merge the results together.  The exact complexity of Quicksort depends on the representation of sequences, and the complexity of the operations on them.  For now, let us just gloss over this to focus on the overall structure of the computation.  If we can ensure that the sequence is split roughly in half (by any of a number of means, including randomization), then the recursion is of logarithmic depth.  At each stage we do linear work, but the recursive calls are independent of one another.  The algorithm has a natural fork-join structure in which we (a) split the sequence into two parts, (b) in parallel sort the two parts, and (c) join the results together.  (The splitting itself can be done in parallel as well, but we’ll not bother about that here.)

Analysis of Quicksort splits into two parts, the work and the depth (or span).  The work is the overall number of steps required to perform the algorithm.  On a sequence of length n, we perform O(n log n) steps to sort the sequence, as might have been expected.  This is the sequential complexity, the amount of time it would take to run on a single processor.  To account for parallelism, there is a second measure, called the depth, that measures the critical path length of the computation, the length of the longest chain of dependencies among subcomputation.  No matter how much parallelism we may have available, we can never run faster than the depth; it represents the idealized parallel complexity of the algorithm, the amount of time it would take given any number of processors.  Depending on the choice of data structures, the depth of Quicksort is O(log^2 n), or thereabouts, which makes it quite parallelizable.  (Note that, as in our earlier discussion, a step of computation is a transition step in the operational semantics of the language we are using to express the algorithm.)

This is very nice and elegant, but now comes the really beautiful part, the concept of a provable implementation of the language.  (The word “provable” is only needed here because of the shabby state of the field; having a proof that your implementation works, and how well it works, remains an oddity to be emphasized, rather than an expectation to be fulfilled by any self-respecting computer scientist.)  The idea is that the language-based model can be implemented on a PRAM (with stated interconnect properties that I will suppress here) with a provable bound on the cost that takes account of the overhead of scheduling onto a p-processor fabric.  Thus, p need not be specified in advance, but is rather determined by the run-time system and compiler, which are provided by the proof of the theorem.

Theorem A computation with work w and depth d can be implemented in a p-processor PRAM in time O(max(w/p, d)).

(Sometimes the second factor is d log p, depending on the assumptions we are making about the target PRAM.)

This is called Brent’s Principle, and it states a fairly obvious fact.  First, we can never execute a depth d program in fewer than d steps.  The depth is the critical path length; this is the amount of time we must wait to execute the algorithm, regardless of any available parallelism.  Second, to the extent that parallelism arises from the dependency (rather, indepedency) structure of the computation, we can at best do p units of work at time, keeping all processors warm, until we are done.  And that’s it!

The effectiveness of the language-based model of parallelism lies entirely in its ability to expose the dependency structure of the computation by not introducing any dependencies that are not forced on us by the nature of the computation itself.  And the key to this is functional programming, which manifests itself here in the transformational approach to computation: sorting is conceived of as a mathematical function that transforms a given sequence into another sequence.  It does not destroy the given sequence any more than adding two numbers destroys those numbers!  Since Quicksort is a mathematical function, we need not worry that execution of qs xsl interferes with (depends on) qs xsg; we can readily run them in parallel without fear of untoward consequences.  The payoff is that there are many fewer dependencies among the subcomputations, and hence many more opportunities for parallelism that can be exploited, in accord with Brent’s Principle, when scheduling the work onto a parallel fabric.

The upshot of all this is that functional programming is of paramount importance for parallelism.  That’s (one reason) why we are teaching functional programming to freshmen at Carnegie Mellon.  And it’s the main reason why we are able to.  There’s nothing to it, just program like you do math, and the parallelism will take care of itself!

Update: Simon Marlowe made similar points to mine some time ago.


What is a Functional Language?

March 16, 2011

I mentioned in a previous post that I am developing a new course on functional programming for first-year students.  The immediate question, which none of you asked, is “what do you mean by functional programming?”.  Good question!  Now, if only I had a crisp answer, I’d be in good shape.

Socially speaking, the first problem I run into is a fundamental misunderstanding about the term “functional programming”.  Most people think that it stands for an ideological position that is defined in opposition to what everyone else thinks and does.  So in the mind of a typical colleague, when I speak of functional programming, she hears “deviant religious doctrine” and “knows” instinctively to disregard it.  Yet what I mean by functional programming subsumes what everyone else is doing as a narrow special case.  For as readers of this blog surely know, there is no better imperative language than a functional language!  In fact I often say that Haskell is the world’s best imperative programming language; I’m only half (if that much) joking.

So if functional programming subsumes imperative programming, then what the hell am I talking about when I advocate for functional programming at the introductory level?  And why, then, do we also have an imperative programming course?  Very good questions.  I’m not sure I have a definitive answer, but this being a blog, I can air my opinions.

Were it up to me, we would have one introductory course, probably two semesters long, on programming, which would, of course, emphasize functional as well as imperative techniques, all in one language.  Personally, I think this would be the right way to go, and would prepare the students for all future courses in our curriculum, and for work out there in the “real world.”  Not everyone believes me (and sometimes I am not sure I believe myself), so we must compromise.  While it surely sounds all nice and ecumenical to teach both imperative and functional programming, the reality is that we’re teaching old and new programming methods.  It’s not so much the imperative part that counts, but their obsolescence.  It is deemed important (and, to be honest, I can see the merit in the argument) that students learn the “old ways” because that’s how most of the world works.  So we have to use a language that is ill-defined (admits programs that cannot be given any meaning in terms of the language itself), that forces sequential, rather than parallel thinking, that demands manual allocation of memory, and that introduces all sorts of complications, such as null pointers, that need not exist at all.  And then we have to teach about things called “tools” that help manage the mess we’ve created—tools that do things like recover from Boolean blindness or check for null pointer errors.  To be sure, these tools are totally amazing, but then so are hot metal typesetting machines and steam locomotives.

So what, then, is a functional language?  I can do no better than list some criteria that are important for what I do; the languages that satisfy these criteria are commonly called functional languages, so that’s what I’ll call them too.

  1. It should have a well-defined semantics expressed in terms of the programs you write, not in terms of how it is “compiled” onto some hypothetical hardware and software platform.  I want to be able to reason about the code I’m writing, not about the code the compiler writes.  And I want to define the meaning of my program in terms of the constructs I’m using, not in terms of some supposed underlying mechanism that implements it.  This is the essence of abstraction.
  2. It should support both computation by evaluation and computation by execution.  Evaluation is a smooth generalization of high school- and university-level mathematics, and is defined in terms of standard mathematical concepts such as tuples, functions, numbers, and so forth.  Variables are given meaning by substitution, and evaluation consists of simplifying expressions.  Execution is the action of a program on another agent or data structure conceived of as existing independently of the program itself.  The data lives “over there” and the program “over here” diddles that data.  Assignables (my word for what in imperative languges are wrongly called variables) are given meaning by get and put operations (fetch and store), not by substitution.  Execution consists of performing these operations.
  3. It should be natural to consider both persistent and ephemeral data structures.  This is essentially a restatement of (2).  Persistent data structures are values that can be manipulated like any other value, regardless of their apparent “size” or “complexity”.  We can as easily pass functions as arguments and return them as results as we can do the same with integers.  We can think of trees or streams as values in the same manner.  Ephemeral data structures are what you learn about in textbooks.  They are external to the program, and manipulated destructively by execution.  These are useful, but not nearly as central as the current texts would have you believe (for lack of consideration of any other kind).
  4. It should have a natural parallel cost model based on the  dependencies among computations so that one may talk about parallel algorithms and their complexity as naturally as one currently discusses sequential algorithms.  Imperative-only programming languages fail this criterion miserably, because there are too many implicit dependencies among the components of a computation, with the result that sequentiality is forced on you by the artificial constraints of imperative thinking.
  5. It should have a rich type structure that permits introduction of new abstract types and which supports modular program development.  By giving short shrift to expressions and evaluation, imperative language have an impoverished notion of type—and not support for modularity.  Worse, object-oriented programming, a species of imperative programming, is fundamentally antimodular because of the absurd emphasis on inheritance and the reliance on class-based organizations in which functionality metastasizes throughout a program.

What languages satisfy these criteria best?  Principally ML (both Standard ML and Objective Caml, the “objective” part notwithstanding), and Haskell. (Unfortunately, Haskell loses on the cost model, about which more in another post.)

This is why we teach ML.


Design a site like this with WordPress.com
Get started