Parallelism and Concurrency, Revisited

April 9, 2014

I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency.  In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system.  Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components).  From this point of view the two concepts aren’t comparable, yet relatively few people seem to accept the distinction, or, even if they do, do not accept the terminology.

Here I’m going to try a possible explanation of why the two concepts, which seem separable to me, may seem inseparable to others.

I think that it is to do with scheduling.

One view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done.  I’ve previously argued that parallelism is about cost, but let’s leave that aside.  It’s unarguable that a parallel computation does consist of a bunch of, well, parallel computations, and so it is about concurrency.  I’ve previously argued that that’s not a good way to think about concurrency either, but let’s leave that aside as well.  So, the story goes, concurrency and parallelism are synonymous, and people like me are just creating confusion.

Perhaps that is true, but here’s why it may not be a good idea to think of parallelism this way.  Scheduling as you learned about it in OS class (for example) is a altogether different than scheduling for parallelism.  There are two aspects of OS-like scheduling that I think are relevant here.  First, it is non-deterministic, and second, it is competitive.  Non-deterministic, because you have little or no control over what runs when or for how long.  A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches.  Second, and more importantly, an OS-like scheduler is allocating resources competitively.  You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible.  We’ll even pay for the privilege (priorities) if necessary.  The scheduler, and the queueing theory behind it is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants.  It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable.  That’s what makes concurrent programming hard: you have to program against all possible schedules.  And that’s why it’s hard to prove much about the time or space complexity of your program when it’s implemented concurrently.

Parallel scheduling is a whole ‘nother ball of wax.  It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as discussed in a previous post and in PFPL).  And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends.  The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible.  Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds.  Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.

Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT.  (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound.  In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.)  These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity.  For example, p-BFS is absolutely dreadful in its space complexity.  (For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation.  His semantic profiling diagrams are amazingly beautiful and informative!)

So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler.  Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end.  At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect.  At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible.  Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform.  Now how cool is that?

[Update: word-smithing.]

[Update: more word-smithing for clarity and concision.]


Old Neglected Theorems Are Still Theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

\begin{array}{rcl}    \textit{gcd}(m,0) & = & m \\    \textit{gcd}(0,m) & = & m \\    \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\    \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m<n    \end{array}

The type of \textit{gcd} defined in this manner has partial function type (\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say 2^{2^n}, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.


There Is Such A Thing As A Declarative Language, and It’s The World’s Best DSL

July 22, 2013

In a recent post I asked whether there is any such thing as a declarative language. The main point was to argue that the standard “definitions” are, at best, not very precise, and to see whether anyone might offer a better definition. What I’m after is an explanation of why people seem to think that the phrase has meaning, even though they can’t say very clearly what they mean by it.  (One commenter analogized with “love” and “happiness”, but I would counter by saying that we’re trying to do science here, and we ought to be able to define our terms with some precision.)

As I mentioned, perhaps the best “definition” that is usually offered is to say that “declarative” is synonymous with “functional-and-logic-programming”.  This is pretty unsatisfactory, since it is not so easy to define these terms either, and because, contrary to conventional classifications, the two concepts have pretty much nothing in common with each other (but for one thing to be mentioned shortly). The propositions-as-types principle helps set them clearly apart: whereas functional programming is about executing proofs, logic programming is about the search for proofs. Functional programming is based on the dynamics of proof given by Gentzen’s inversion principle. Logic programming is based on the dynamics of provability given by cut elimination and focusing.  The two concepts of computation could not be further apart.

Yet they do have one thing in common that is usefully isolated as fundamental to what we mean by “declarative”, namely the concept of a variable.  Introduced by the ancient Hindu and Muslim mathematicians, Brahmagupta and al Kwharizmi, the variable is one of the most remarkable achievements of the human intellect.  In my previous post I had secretly hoped that someone would propose variables as being central to what we mean by “declarative”, but no one did, at least not in the comments section.  My unstated motive for writing that post was not so much to argue that the term “declarative” is empty, but to test the hypothesis that few seem to have grasp the importance of  this concept for designing a civilized, and broadly applicable, programming language.

My contention is that variables, properly so-called, are what distinguish “declarative” languages from “imperative” languages. Although the imperative languages, including all popular object-oriented languages, are based on a concept that is called a variable, they lack anything that actually is a variable.   And this is where the trouble begins, and the need for the problematic distinction arises.  The declarative concept of a variable is the mathematical concept of an unknown that is given meaning by substitution. The imperative concept of a variable, arising from low-level machine models, is instead given meaning by assignment (mutation), and, by a kind of a notational pun, allowed to appear in expressions in a way that resembles that of a proper variable.  But the concepts are so fundamentally different, that I argue in PFPL that the imperative concept be called an “assignable”, which is more descriptive, rather than “variable”, whose privileged status should be emphasized, not obscured.

The problem with purely imperative programming languages is that they have only the concept of an assignable, and attempt to make it serve also as a concept of variable. The results are a miserable mess of semantic and practical complications. Decades of work has gone into rescuing us from the colossal mistake of identifying variables with assignables. And what is the outcome? If you want to reason about assignables, what you do is (a) write a mathematical formulation of your algorithm (using variables, of course) and (b) show that the imperative code simulates the functional behavior so specified.   Under this methodology the mathematical formulation is taken as self-evidently correct, the standard against which the imperative program is judged, and is not itself in need of further verification, whereas the imperative formulation is, invariably, in need of verification.

What an odd state of affairs!  The functional “specification” is itself a perfectly good, and apparently self-evidently correct, program.  So why not just write the functional (i.e., mathematical) formulation, and call it a day?  Why indeed!  Declarative languages, being grounded in the language of mathematics, allow for the identification of the “desired behavior” with the “executable code”.  Indeed, the propositions-as-types principle elevates this identification to a fundamental organizing principle: propositions are types, and proofs are programs.  Who needs verification?  Once you have a mathematical specification of the behavior of a queue, say, you already have a running program; there is no need to relegate it to a stepping stone towards writing an awkward, and invariably intricate, imperative formulation that then requires verification to ensure that it works properly.

Functional programming languages are written in the universally applicable language of mathematics as expressed by the theory of types.  Such languages are therefore an integral part of science itself, inseparable from our efforts to understand and master the workings of the world.  Imperative programming has no role to play in this effort, and is, in my view, doomed in the long run to obsolescence, an artifact of engineering, rather than a fundamental discovery on a par with those of mathematics and science.

This brings me to my main point, the popular concept of a domain-specific language. Very much in vogue, DSL’s are offered as the solution to many of our programming woes. And yet, to borrow a phrase from my colleague Guy Blelloch, the elephant in the room is the question “what is a domain?”. I’ve yet to hear anyone explain how you decide what are the boundaries of a “domain-specific” language. Isn’t the “domain” mathematics and science itself? And does it not follow that the right language must be the language of mathematics and science? How can one rule out anything as being irrelevant to a “domain”?  I think it is impossible, or at any rate inadvisable, to make such restrictions a priori.  Indeed, full-spectrum functional languages are already the world’s best DSL’s, precisely because they are (or come closest to being) the very language of science, the ultimate “domain”.


What, If Anything, Is A Declarative Language?

July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages.  But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not.  Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich).  Later, as the logic programming wave subsided, the term “declarative”,  like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought.  Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains.  This got me to thinking, again, about whether the term has any useful meaning.  For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

  1. “Declarative” means “high-level”.  This just seems to replace one vague term by another.
  2. “Declarative” means “not imperative”.  But this flies in the face of reality.  Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
  3. “Declarative” means “functional”.  OK, but then we don’t really need another word.
  4. “Declarative” means “what, not how”.  But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it.  Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
  5. “Declarative” means “equational”.  This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence.  Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another?  At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
  6. “Declarative” means “referentially transparent”.  The misappropriation of Quine’s terminology only confuses matters.  All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid.  But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals.  In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
  7. “Declarative” means “has a denotation”.  This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces.  In other words, it would be a synonym for “denotational semantics”.  But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages.  Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational.  Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
  8. “Declarative” means “implicitly parallelizable“.  This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common.  Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations.  It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages.  Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

[Update: add a remark about semantics, add another candidate meaning.]

[Update: added link to previous post.]


Constructive Mathematics Is Not Metamathematics

July 10, 2013

The publication of the Homotopy Type Theory book has renewed interest in type theory as a foundation for mathematics, and spurred computer scientists to investigate the computational meaning of higher-dimensional types. As I mentioned in a previous post, what makes HoTT work so well for homotopy theory is that it is constructive, which means, at the very least, that it does not postulate that all types are decidable. By Hedberg’s Theorem any type with decidable equality is a set (homotopy 0-type), so a blanket adoption of the classical law of the excluded middle would immediately rule out any higher-dimensional structure.

To my way of thinking, the denial of the universal validity of the excluded middle is not the defining feature of constructivity, but rather a characteristic feature of constructivity—it is the smoke, not the locomotive. But what, then, is the deeper meaning of constructivity that gives rise to the denial of many classical patterns of reasoning, such as proof by contradiction or reasoning by cases on whether a proposition is true or not? Although the full story is not yet completely clear, a necessary condition is that a constructive theory be proof relevant, meaning that proofs are mathematical objects like any other, and that they play a role in the development of constructive mathematics unlike their role in classical mathematics.

The most obvious manifestation of proof relevance is the defining characteristic of HoTT, that proofs of equality correspond to paths in a space. Paths may be thought of as evidence for the equality of their endpoints. That this is a good notion of equality follows from the homotopy invariance of the constructs of type theory: everything in sight respects paths (that is, respect the groupoid structure of types). More generally, theorems in HoTT tend to characterize the space of proofs of a proposition, rather than simply state that the corresponding type is inhabited. For example, the univalence axiom itself states an equivalence between proofs of equivalence of types in a universe and equivalences between these types. This sort of reasoning may take some getting used to, but its beauty is, to my way of thinking, undeniable. Classical modes of thought may be recovered by explicitly obliterating the structure of proofs using truncation. Sometimes this is the best or only available way to state a theorem, but usually one tends to say more than just that a type is inhabited, or that two types are mutually inhabited. In this respect the constructive viewpoint enriches, rather than diminishes, classical mathematics, a point that even the greatest mathematician of the 20th century, David Hilbert, seems to have missed.

The concept of proof relevance in HoTT seems to have revived a very common misunderstanding about the nature of proofs. Many people have been trained to think that a proof is a derivation in an axiomatic theory, such as set theory, a viewpoint often promoted in textbooks and bolstered by the argument that an informal proof can always be written out in full in this form, even if we don’t do that as a matter of course. It is a short step from there to the conclusion that proofs are therefore mathematical objects, even in classical set theory, because we can treat the derivations as elements of an inductively defined set (famously, the set of natural numbers, but more realistically using more natural representations of abstract syntax such as the s-expression formalism introduced by McCarthy in 1960 for exactly this purpose). From this point of view many people are left confused about the stress on “proofs as mathematical objects” as a defining characteristic of HoTT, and wonder what could be original about that.

The key to recognize that a proof is not a formal proof. To avoid further confusion, I hasten to add that by “formal” I do not mean “rigorous”, but rather “represented in a formal system” such as the axiomatic theory of sets. A formal proof is an element of a computably enumerable set generated by the axioms and rules of the formal theory. A proof is an argument that demonstrates the truth of a proposition. While formal proofs are always proofs (at least, under the assumption of consistency of the underlying formal theory), a proof need not be, or even have a representation as, a formal proof. The principal example of this distinction is Goedel’s Theorem, which proves that the computably enumerable set of formal provable propositions in axiomatic arithmetic is not decidable. The key step is to devise a self-referential proposition that (a) is not formally provable, but (b) has a proof that shows that it is true. The crux of the argument is that once you fix the rules of proof, you automatically miss out true things that are not provable in that fixed system.

Now comes the confusing part. HoTT is defined as a formal system, so why doesn’t the same argument apply? It does, pretty much verbatim! But this has no bearing on “proof relevance” in HoTT, because the proofs that are relevant are not the formal proofs (derivations) defining HoTT as a formal system. Rather proofs are formulated internally as objects of the type theory, and there is no commitment a priori to being the only forms of proof there are. Thus, for example, we may easily see that there are only countably many functions definable in HoTT from the outside (because it is defined by a formal system), but within the theory any function space on an infinite type has uncountably many elements. There is no contradiction, because the proofs of implications, being internal functions, are not identified with codes of formal derivations, and hence are not denumerable.

There is a close analogy, previously noted in this blog, with Church’s Law. Accepting Church’s Law internally amounts to fixing the programming language used to define functions in advance, permitting us to show, for example, that certain programs are not expressible in that language. But HoTT does not commit to Church’s Law, so such arguments amount to showing that, for example, there is no Turing machine to decide halting for Turing machines, but allowing that there could be constructive functions (say, equipped with oracles) that make such decisions.

The theory of formal proofs, often called proof theory, was dubbed metamathematics by Kleene. Until the development of type theory the study of proofs was confined to metamathematics. But now in the brave new world of constructive mathematics as embodied in HoTT, proofs (not just formal proofs) have pride of place in mathematics, and provide opportunities for expressing concepts clearly and cleanly that were hitherto obscured or even hidden from our view.

Update: Corrected silly wording mistake.


More Is Not Always Better

January 28, 2013

In a previous post I discussed the status of Church’s Law in type theory, showing that it fails to hold internally to extensional type theory, even though one may see externally that the definable numeric functions in ETT are λ-definable, and hence Turing computable.  The distinction between internal and external is quite important in logic, mainly because a logical formalism may be unable to express precisely an externally meaningful concept.  The classical example is the Löwenheim-Skolem Theorem of first-order logic, which says that any theory with an infinite model has a countable model.  In particular the theory of sets has a countable model, which would seem to imply that the set of real numbers, for example, is countable.  But internally one can prove that the reals are uncountable (Cantor’s proof is readily expressed in the theory), which seems to be a paradox of some kind.  But no, all it says is that the function witnessing the countability of the term model cannot be expressed internally, and hence there is no contradiction at all.

A similar situation obtains with Church’s Law.  One may observe empirically, so to say, that Church’s Law holds externally of ETT, but this fact cannot be internalized.  There is a function given by Church’s Law that “decompiles” any (extensional) function of type N→N by providing the index for a Turing machine that computes it.  But this function cannot be definable internally to extensional type theory, because it may be used to obtain a decision procedure for halting of Turing machines, which is internally refutable by formalizing the standard undecidability proof.  In both of these examples it is the undefinability of a function that is important to the expressive power of a formalism, contrary to naïve analyses that would suggest that, when it comes to definability of functions, the more the merrier.  This is a general phenomenon in type theory.  The power of type theory arises from its strictures, not its affordances, in direct opposition to the ever-popular language design principle “first-class x” for all imaginable values of x.

Another perspective on the same issue is provided by Martin-Löf’s meaning explanation of type theory, which is closely related to the theory of realizability for constructive logic.  The high-level idea is that a justification for type theory may be obtained by starting with an untyped concept of computability (i.e., a programming language given by an operational semantics for closed terms), and then giving the meaning of the judgments of type theory in terms of such computations.  So, for example, the judgment A type, where A is a closed expression means that A evaluates to a canonical type, where the canonical types include, say, Nat, and all terms of the form A’→A”, where A’ and A” are types.  Similarly, if A is a type, the judgment a:A means that A evaluates to a canonical type A’ and that a evaluates to a canonical term a’ such that a’ is a canonical element of A’, where, say, any numeral for a natural number is a canonical member of Nat.  To give the canonical members of the function type A’→A” requires the further notion of equality of elements of a type, a=b:A, which all functions are required to respect.  A meaning explanation of this sort was suggested by Martin-Löf in his landmark paper Constructive Mathematics and Computer Programming, and is used as the basis for the NuPRL type theory, which extends that account in a number of interesting directions, including inductive and coinductive types, subset and quotient types, and partial types.

The relation to realizability emerges from applying the meaning explanation of types to the semantics of propositions given by the propositions-as-types principle (which, as I’ve previously argued, should not be called “the Curry-Howard isomorphism”).  According to this view a proposition P is identified with a type, the type of its proofs, and we say that P true iff evaluates to a canonical proposition that has a canonical member.  In particular, for implication we say that P→Q true if and only if P true implies Q true (and, in addition, the proof respects equality, a condition that I will suppress here for the sake of simplicity).  More explicitly, the implication is true exactly when the truth of the antecedent implies the truth of the consequent, which is to say that there is a constructive transformation of proofs of P into proofs of Q.

In recursive realizability one accepts Church’s Law and demands that the constructive transformation be given by the index of a Turing machine (i.e., by a program written in a fixed programming language).  This means, in particular, that if P expresses, say, the decidability of the halting problem, for which there is no recursive realizer, then the implication P→Q is vacuously true!  By taking Q to be falsehood, we obtain a realizer for the statement that the halting problem is undecidable.  More generally, any statement that is not realized is automatically false  in the recursive realizability interpretation, precisely because the realizers are identified with Turing machine indices.  Pressing a bit further, there are statements, such as the statement that every Turing machine either halts or diverges on its own input, that are true in classical logic, yet have no recursive realizer, and hence are false in the realizability interpretation.

In contrast in the meaning explanation for NuPRL Church’s Law is not assumed.  Although one may show that there is no Turing machine to decide halting for Turing machines, it is impossible to show that there is no constructive transformation that may do so.  For example, an oracle machine would be able to make the required decision.  This is entirely compatible with intuitionistic principles, because although intuitionism does not affirm LEM, neither does it deny it.  This point is often missed in some accounts, leading to endless confusions.  Intuitionistic logic, properly conceived, is compatible with classical logic in that classical logic may be seen as an idealization of intuitionistic logic in which we heuristically postulate that all propositions are decidable (all instances of LEM hold).

The crucial point distinguishing the meaning explanation from recursive realizability is precisely the refusal to accept Church’s Law, a kind of comprehension principle for functions as discussed earlier.  This refusal is often called computational open-endedness because it amounts to avoiding a commitment to the blasphemy of limiting God’s programming language to Turing machines (using an apt metaphor of Andrej Bauer’s).  Rather, we piously accept that richer notions of computation are possible, and avoid commitment to a “final theory” of computation in which Church’s Law is postulated outright.  By avoiding the witnessing function provided by Church’s Law we gain expressive power, rather than losing it, resulting in an elegant theory of constructive mathematics that enriches, rather than diminishes, classical mathematics.    In short, contrary to “common sense” (i.e., uninformed supposition), more is not always better.

Update: corrected minor technical error and some typographical errors.

Update: clarified point about incompatibility of recursive realizability with classical logic.


Exceptions Are Shared Secrets

December 3, 2012

Exceptions are commonly criticized as being the “goto’s” of modern programming languages.  Raising an exception transfers control to an unknown destination, it is said, and this is a bad thing on engineering grounds.  I disagree.  It is perfectly predictable where a raised exception will be handled, provided that exceptions are done properly, which, unfortunately, is not always the case.

The crucial point that exception values are shared secrets. Let us distinguish two parties, the raiser of the exception, and the handler of it. The fundamental idea of exceptions is to transfer a value from the raiser to the handler without the possibility of interception by another party. While the language of secrecy seems appropriately evocative, I hasten to add that I am not here concerned with “attackers” or suchlike, but merely with the difficulties of ensuring modular composition of programs from components. In such a setting the “attacker” is yourself, who is not malicious, but who is fallible.

By raising an exception the raiser is “contacting” a handler with a message. The raiser wishes to limit which components of a program may intercept that message. More precisely, the raiser wishes to ensure that only certain previously agreed-upon components may handle that exception, perhaps only one. This property should remain stable under extension to the program or composition with any other component. It should not be possible for an innocent third party to accidentally intercept a message that was not intended for it.

Achieving this requires a secrecy mechanism that allows the raiser and the handler(s) to agree upon their cooperation. This is accomplished by dynamic classification, exactly as it is done properly in Standard ML (but not O’Caml). The idea is that the raiser has access to a dynamically generated constructor for exception values, and any handler has access to the corresponding dynamically generated matcher for exception values. This means that the handler, and only the handler, can decode the message sent by the raiser; no other party can do anything with it other than pass it along unexamined. It is “perfectly encrypted” and cannot be deciphered by any unintended component.

The usual exception mechanisms, as distinct from exception values, allow for “wild-card handlers”, which means that an exception can be intercepted by a third party. This means that the raiser cannot ensure that the handler actually receives the message, but it can ensure, using dynamic classification, that only a legitimate handler may decipher it. Decades of experience with Standard ML shows that this is a very useful thing indeed, and has application far beyond just the simple example considered here. For full details, see my forthcoming book, for a full discussion of dynamic classification and its role for ensuring integrity and confidentiality in a program. Dynamic classification is not just for “security”, but is rather a good tool for everyday programming.

Haskell is one language that does not get exceptions right.  Allocating an exception incurs a storage effect, and so would have to be confined to the IO monad in Haskell.  But this would destroy the utility of exceptions in pure code.  The result is an exception mechanism that is arguably broken and that does not provide the guarantees that make exceptions a perfectly pleasant and useful way to write code.

Update: Reworked last paragraph to clarify the point I am making; the previous formulation appears to have invited misinterpretation.

Update: This account of exceptions also makes clear why the perennial suggestion to put exception-raising information into types makes no sense to me. I will write more about this in a future post, but meanwhile contemplate that a computation may raise an exception that is not even in principle nameable in the type. That is, it is not conservativity that’s at issue, it’s the very idea.

Update: Wordsmithing, removal of irrelevant remarks to focus on the main point about dynamic exceptions.


Introductory FP Course Materials

September 15, 2012

The course materials for our first-semester introductory programming course (which I’ve discussed elsewhere on this blog) are now available here.

The course materials for our second-semester data structures and algorithms course are available here.

Thanks to Dan Licata and Guy Blelloch for helping make these available.  Comments are most welcome.

Update: I will be giving a talk about this at Google Pittsburgh on Thursday the 20th of September.


Yet Another Reason Not To Be Lazy Or Imperative

August 26, 2012

In an earlier post I argued that, contrary to much of the literature in the area, parallelism is all about efficiency, and has little or nothing to do with concurrency.  Concurrency is concerned with controlling non-determinism, which can arise in all sorts of situations having nothing to do with parallelism.  Process calculi, for example, are best viewed as expressing substructural composition of programs, and have very little to do with parallel computing.  (See my PFPL and Rob Simmons’ forthcoming Ph.D. dissertation for more on this perspective.)  Parallelism, on the other hand, is captured by analyzing the cost of a computation whose meaning is independent of its parallel execution.  A cost semantics specifies the abstract cost of a program that is validated by a provable implementation that transfers the abstract cost to a precise concrete cost on a particular platform.  The cost of parallel execution is neatly captured by the concept of a cost graph that captures the dynamic data dependencies among subcomputations.  Details such as the number of processors or the nature of the interconnect are factored into the provable implementation, which predicts the asymptotic behavior of a program on a hardware platform based on its cost graph.  One advantage of cost semantics for parallelism is that it is easy to teach freshmen how to write parallel programs; we’ve been doing this successfully for two years now, with little fuss or bother.

This summer Guy Blelloch and I began thinking about other characterizations of the complexity of programs besides the familiar abstractions of execution time and space requirements of a computation.  One important measure, introduced by Jeff Vitter, is called I/O Complexity.  It measures the efficiency of algorithms with respect to memory traffic, a very significant determiner of performance of programs.  The model is sufficiently abstract as to encompass several different interpretations of I/O complexity.  Basically, the model assumes an unbounded main memory in which all data is ultimately stored, and considers a cache of M=c\times B blocked into chunks of size B that provides quick access to main memory.  The complexity of algorithms is analyzed in terms of these parameters, under the assumption that in-cache accesses are cost-free, so that the only significant costs are those incurred by loading and flushing the cache.  You may interpret the abstract concepts of main memory and cache in the standard way as a two-level hierarchy representing, say, on- and off-chip memory access, or instead as representing a disk (or other storage medium) loaded into memory for processing.  The point is that the relative costs of processing cached versus uncached data is huge, and worth considering as a measure of the efficiency of an algorithm.

As usual in the algorithms world Vitter makes use of a low-level machine model in which to express and evaluate algorithms.  Using this model Vitter obtains a lower-bound for sorting in the I/O model, and a matching upper bound using a k-way merge sort, where k is chosen as a function of M and B (that is, it is not cache oblivious in the sense of Leiserson, et al.)  Although such models provide a concrete, and well-understood, basis for analyzing algorithms, we all know full well that programming at such a low-level is at best a tedious exercise.  Worse, machine models provide no foundation for composition of programs, the single most important characteristic of higher-level language models.  (Indeed, the purpose of types is to mediate composition of components; without types, you’re toast.)

The point of Guy’s and my work this summer is to adapt the I/O model to functional programs, avoiding the mess, bother, and futility of trying to work at the machine level.  You might think that it would be impossible to reason about the cache complexity of a functional program (especially if you’re under the impression that functional programming necessarily has something to do with Haskell, which it does not, though you may perhaps say that Haskell has something to do with functional programming).  Traditional algorithms work, particularly as concerns cache complexity, is extremely finicky about memory management in order to ensure that reasonable bounds are met, and you might reasonably suspect that it will ever be thus.  The point of our paper, however, is to show that the same asymptotic bounds obtained by Vitter in the I/O model may be met using purely functional programming, provided that the functional language is (a) non-lazy (of course), and (b) implemented properly (as we describe).

Specifically, we give a cost semantics for functional programs (in the paper, a fragment of ML) that takes account of the memory traffic engendered by evaluation, and a provable implementation that validates the cost semantics by describing how to implement it on a machine-like model.  The crux of the matter is to account for the cache effects that arise from maintaining a control stack during evaluation, even though the abstract semantics has no concept of a stack (it’s part of the implementation, and cannot be avoided).  The cost semantics makes explicit the reading and allocation of values in the store (using Felleisen, Morrisett, and H’s “Abstract Models of Memory Management”), and imposes enough structure on the store to capture the critical concept of locality that is required to ensure good cache (or I/O) behavior.  The abstract model is parameterized by M and B described above, but interpreted as representing the number of objects in the cache and the neighborhood of an object in memory (the objects that are allocated near it, and that are therefore fetched along with the object whenever the cache is loaded).

The provable implementation is given in two steps.  First, we show how to transfer the abstract cost assigned to a computation into the amount of memory traffic incurred on an abstract machine with an explicit control stack.  The key idea here is an amortization argument that allows us to obtain tight bounds on the overhead required to maintain the stack.  Second, we show how to implement the crucial read and allocate operations that underpin the abstract semantics and the abstract machine.  Here we rely on a competitive analysis, given by Sleator, et al., of the ideal cache model, and on an amortization of the cost of garbage collection in the style of Appel.  We also make use of an original (as far as I know) technique for implementing the control stack so as to avoid unnecessary interference with the data objects in cache.  The net result is that the cost semantics provides an accurate asymptotic analysis of the I/O complexity of a functional algorithm, provided that it is implemented in the manner we describe in the paper (which, in fact, is not far from standard implementation techniques, the only trick being how to manage the control stack properly).  We then use the model to derive bounds for several algorithms that are comparable to those obtained by Vitter using a low-level machine model.

The upshot of all of this is that we can reason about the I/O or cache complexity of functional algorithms, much as we can reason about the parallel complexity of functional algorithms, namely by using a cost semantics.  There is no need to drop down to a low-level machine model to get a handle on this important performance metric for your programs, provided, of course, that you’re not stuck with a lazy language (for those poor souls, there is no hope).


Believing in Computer Science

August 25, 2012

It’s not every day that I can say that I agree with Bertrand Meyer, but today is an exception. Meyer has written an opinion piece in the current issue of C.ACM about science funding that I think is worth amplifying. His main point is that funding agencies, principally the NSF and the ERC, are constantly pushing for “revolutionary” research, at the expense of “evolutionary” research. Yet we all (including the funding agencies) know full well that, in almost every case, real progress is made by making seemingly small advances on what is already known, and that whether a body of research is revolutionary or not can only be assessed with considerable hindsight. Meyer cites the example of Hoare’s formulation of his logic of programs, which was at the time a relatively small increment on Floyd’s method for proving properties of programs. For all his brilliance, Hoare didn’t just invent this stuff out of thin air, he built on and improved upon the work that had gone before, as of course have hundreds of others built on his in turn. This all goes without saying, or ought to, but as Meyer points out, we computer scientists are constantly bombarded with direct and indirect exhortations to abandon all that has gone before, and to make promises that no one can honestly keep.

Meyer’s rallying cry is for incrementalism. It’s a tough row to hoe. Who could possibly argue against fostering earth-shattering research that breaks new ground and summarily refutes all that has gone before? And who could possibly defend work that is obviously just another slice of the same salami, perhaps with a bit of mustard this time? And yet what he says is obviously true. Funding agencies routinely beg the very question under consideration by stipulating a priori that there is something wrong with a field, and that an entirely new approach is required. With all due respect to the people involved, I would say that calls such as these are both ill-informed and outrageously arrogant.

But where does this attitude come from? Meyer cites “market envy” as one particularly powerful influence. Funding agencies wish to see themselves as analogous to venture capitalists investing in the next big thing, losing track of the fundamental differences between basic research and product development. (We see this sort of nonsense all the time in national politics; there is always a constituency for the absurd proposition that a government should be run like a business, as if there were any similarity at all between the two. What they really mean is, turn the government’s money over to business, but that can’t be said too loudly for fear that people will catch on.)

Another influence, which Meyer doesn’t mention, seems to be a long-standing problem in computer science. It seems to me that many researchers who move into political and administrative roles are either bored with, or do not believe in, computer science as an academic discipline. Their own research area is, or maybe always was, boring, or has been obviated by technological developments or scientific advances. So they move into politics, perhaps carrying a sense that there is something wrong with the field that can only be corrected by radical surgery. They then demand that researchers do what they themselves never did in their own careers, kicking the ladder out from behind them.

From this somewhat contentious premise, much follows. The constant implication, stated and unstated, that CS research is not worth doing for its own sake. The emphasis on interdisciplinarity as an end in itself, rather than a means to an end. The emphasis on applications to other disciplines being more important than CS itself. The sense that “broader impacts” are far more important than the “innovative claims” (the actual work) in a research proposal.

Seen from this perspective, Meyer’s position is much more easily defensible. When funding agencies are talking about “breakthroughs” and “paradigm shifts”, what they really mean is “anything but computer science”. When Meyer talks about incrementalism, what he really means is “computer science is worth doing for its own sake”.

And I, for once, agree with him.


Design a site like this with WordPress.com
Get started