Useless Factor: essay

Showing posts with label essay. Show all posts

Wednesday, November 14, 2007

Programming Language Consciousness

If Nelson Mandela is South Africa's Martin Luther King Jr., then Steve Biko is its Malcolm X*. Steve Biko, under the pen name Frank Talk, developed the philosophy of Black Consciousness. This is the idea that Black people in South Africa need to work together with each other to liberate themselves from Apartheid. Liberation would not come as a gift from White liberals, but necessarily came exclusively from the efforts of the oppressed themselves. Before integration is necessary—before it is even helpful— Blacks need to be proud of their own culture and conscious of their oppression.

Some language communities (Tcl, for example) have a shared belief that, it's best to write many things in C, and then bind that library to the higher-level language. C has some advantages: it can be very fast, and it can integrate well with external libraries, which are usually also written in C or C++. So, when users of these languages need something like a GUI library, or a complex data structure, or a function which could sensibly be included as a primitive, or just something that they feel should run a little faster, they go and implement that thing in C. But one of the sources of inspiration for these languages was always to have something better to program in than C. Once the programming language exists, not as much repetitive, low-level coding has to be done in that language.

If we programmers are going to be liberated from this oppressive world of C and other imperative, low-level languages with little sense of abstraction, we have to take charge. A new language could be written in C, but then, once the core is there, libraries should be written in that language. This is the only way that liberation can be achieved.** It's fine to write a binding for existing C libraries, but for new libraries Steve Biko, Alan Turing and I are here to tell you that you do have the power to write them in that language itself! Don't fall into the incrementalist trap of using the new language for glue and C for the "heavy-duty" stuff, as this will only prolong and damage the struggle. Code written in the new, better language will be easier to maintain and extend. Paraphrasing Biko speaking about not relying on one's oppressor: Programmer, you are on your own.

* That is, if MLK had founded a guerilla organization and later became president, and if Malcolm X were viciously murdered by the government. I'm referring to the old Malcolm X, of course, not after started being more polite in rhetoric to White people. As it turned out, Mandela's movement (the African National Congress) was a huge amount more successful than the movement that Biko inspired (the Black Consciousness Movement, and later the Azanian People's Organization). Also, in the early '90s, the ANC was basically at war with the Inkhata Freedom Party, another liberation movement, in what's today KwaZulu-Natal***, and they also had significant conflicts with Azapo during the run-up to the elections. But who's counting? It's called an analogy, people. By the way, for those of you who are aware that there are more than just Africans and Europeans in South Africa, Biko extended the definition of Black to include everyone who's oppressed and acknowledges it, basically. (This way, Indians and Colored people could join the movement.)

** OK, in reality, as the above footnote notes, the multiracialist ANC beat out Azapo, but it's the idea that matters...

*** KwaZulu-Natal is my favorite province ever. Who would have thought of a CamelCase name? (Well, must've been those missionaries who created the current writing system for isiZulu.)

If my relatively intense (for me), intro-level History of Southern Africa course is good for anything, it's good for drawing spurious historical analogies for slightly tongue-in-cheek essays, right? This started out being an essay explaining why I'm a little uncomfortable with Io, because of their philosophy of implementing everything in C, but I ended up not mentioning it, except in this footnote. Sadly, this will be my most-read essay on South Africa.

Update: My wonderful history professor, Jamie Monson, read this post and told me that I was a little wrong about the historical fact about the ANC fighting with Azapo, because the mostly fought with the Inkatha Freedom Party. How could I have made such an error? It's fixed now.

Friday, May 11, 2007

The advantage of functional programming

I know there have been a lot of articles on this theme before, but I wanted a go at it anyway, even though most readers of this blog already use a functional language.

I was trying to explain the advantage of a functional programming language to a friend, but ended up being rather unpersuasive. After I was done, he said, "I think it's just easier to use whatever you've learned." But in my opinion, it's more than that. There are clear, objective advantages to using a functional programming language over a non-functional one.

"Functional" is one of those programming language words (like "object-oriented" or "typed") that is often used but poorly defined. Some people use it to refer to an application-style dataflow (as I defined last March). This definition fits most functional languages but excludes languages like Factor, which are functional but use stack-based dataflow, and possibly also haXe and Scala, both of which can comfortably be used with mutation-style dataflow. Really, the dataflow style isn't the issue. In my mind, it's all about higher-order functions and little pieces of data.

Higher order functions, despite the scary name, are relatively simple and extremely useful. It's just passing a function as an argument to a function. So to apply a single operation to each element in an array and get a new one out, you don't need to iterate through the array yourself; just use some higher order function (map) and the operation that you'll be using on each element. For a more concrete example, here's a Java method that will take an ArrayList and make a new one with 1 added to each element.


public static ArrayList addone(ArrayList list) {
    ArrayList out = new ArrayList(list.size());
    Iterator iter = list.iterator();
    while(iter.hasNext()) {
        out.add(new Integer(iter.next().intValue()+1));
    }
    return out;
}

Now here it is in Scheme, a functional language


(define (addone lst)
    (map (lambda (i) (+ 1 i)) lst))

(In Factor, it's just : addone [ 1+ ] map ; but that's just an unfair comparison, don't you think?)

Now imagine if Java supported anonymous delegates, like C#, and it also had a good set of higher order functions (unlike C#, as far as I can tell) the code could be written something like this (using something like the syntax proposed by Neal Gafter):


public static ArrayList addone(ArrayList list) {
    return list.map(Integer x) {
        return new Integer(x.intValue()+1);
    };
}

The inner return returns a value from the closure inside, rather than the whole method. Now there are still a few kinks to be worked out (why bother with an explicit return? And why can't you put an int in an ArrayList, instead using the awkward Integer?) but it's obviously an improvement. Map only works for some particular cases, but once a few more methods are added (such as fold, find and each) almost all array processing can be covered. Currently, in Java code, the same boilerplate iterator code is used on each thing that would be replaced by map, fold, find or each.

Some languages try to solve the problem by adding a number of specific constructs--foreach, with, using, etc. which execute a block of code in a particular context, which makes examples like these look better. But these are only partial solutions, and do not allow for extension by the programmer. There will always be needs that aren't anticipated by the language designer, and little measures like these won't fulfill them all.

To get the benefit of this, there needs to be more than just access to higher order functions. It also has to be possible to use them in everyday code beyond simple UI callbacks. The standard library should make heavy use of them where appropriate (such as in array/list processing), and languages like C# and Javascript lack this. Additionally, there needs to be a nice, convenient syntax for higher order functions that makes them look just as pretty, or almost as pretty, as builtin constructs. Java has anonymous inner classes, which can do about half of what closures do, but the syntax is so horrible that it is very rarely used for higher-order functions. If these two things don't exist, practical use of higher order functions will be limited.

The other main advantage of functional programming languages is the ability to allocate lots of relatively small data structures that don't (usually) get mutated. I know this is somewhat vague, but it is actually very important. For one thing, for this to be practical, there must be garbage collection. It would be really annoying to write and inefficient to run a program that allocates and frees many little data structures in C or C++. Because of this, the practice is generally avoided. But in a functional programming language, not only is this easy syntax-wise; it's also efficient. I'm not completely sure how to describe it in objective terms, but it's much easier to write a program when you're not as concerned about minimizing consing.

Anyway, I hope this was enlightening, but it may have been hard to follow if you, you know, didn't already know what functional programming was. So feel free to comment.

Tuesday, April 10, 2007

Why do we need data structures?

Right now, I'm facing a particularly annoying and indecipherable bug in integrating the assocs protocol with the rest of the Factor system (I have no idea what it is; it just doesn't work), so I decided to write another one of those stupid essays you all hate.

A couple nights ago, I was trying to explain to my mom what I was programming, the associative array protocol. I explained it badly, but a better way to put it would be this: imagine you're computerizing a library. The main thing you have to do is make it so that people can look up the call number of books. Everyone already knows the title and correct spelling of the book they want to get. Your job is to write a program that, given the title, will look up the call number for you. It has to do this quickly, even if there are a lot of books in the library. Of course, to do this, you need an associative array. This is obvious to anyone who's done much programming. I'm sure most of you already have your favorite structure picked out.

But why do we need to structure the data, anyway? A programmer might say, "to allow data to be accessed with a minimal asymptotic complexity." Basically, that means in as few steps as possible, even if there's a lot of data. Still, this doesn't completely mesh with how computers have been explained to the average person. It probably goes something like this. (This is about as far as the Magic School Bus gets, when the class goes into a computer)

First, input is given to the computer. The input is first in the RAM. It is in ones and zeroes, because that's easier for the computer to handle. To process the data, some ones and zeroes are sent to the CPU, together with some instructions, which write other ones and zeroes back to the RAM. The ones and zeroes represent numbers in base 2. Then, when you want to save the data, those ones and zeroes are taken from the RAM and written to the hard drive.

Then, many people understand the higher level stuff about computers: the Sony VAIO is a good laptop, this is how you use Microsoft Word, Mac OS X is based on Unix, etc.

In the middle, there's a huge gap of information that only specialists know about. Right above the bottom of that gap sit data structures. Data structures, along with file formats, are the way those ones and zeroes are organized. If we only had numbers, we really couldn't do much with computers.

The basis for all data structures is the pointer. Basically, every single location in the RAM has a number describing where it is. A pointer is a number that points to one of those positions in RAM. So the simplest thing you can do is say that a particular pointer doesn't just point to one number; it points to a number, and then the number right after it in RAM. And why limit yourself to just one? Why not 100? This pointer, the pointer to location 42313, represents the number at all of the locations 42313-42412. To get the 53rd item in the array, you just look up what number is at the pointer 42313+53 (counting starts at 0, not 1). This whole thing is called an array.

Data structures get much more complicated, but the idea remains the same: you have numbers, and then you have pointers to numbers, and even pointers to pointers to numbers. Why so many pointers and so many ways of organizing them? Because if you have the right pointers, they take you to the right number, the data that you want. A good deal of computer scientists' work has been devoted to getting just the right way to orient pointers, and counting exactly how many steps they take to find the data.

Sunday, April 1, 2007

My awesome idea

Building on what I learned before about computational complexity (see previous blog post) I'm now making a great new device that could infer computational complexity! It'll be an Eclipse plugin; just press Ctrl+Shift+F7 while selecting a method and a message box pops up, showing, in big-O notation, how long it takes to run. To implement this, I'll need to use a lot of Java's metaprogramming techniques, but once I have that out of the way, it'll be simple. And it'll work for everything.

April fools!

Actually, this is fundamentally impossible to make. Why? The answer goes back to Alan Turing and the halting problem. The question of the halting problem is, is it possible to construct a program that can tell if another program will halt? Well, consider this program (in Wikipedia-style pseudocode):


function example()
    if halts?(example) then
        infinite-loop()
    else
        return nil

So, what happens when we run halts?(example)? Well, if this returns true, then example itself should go into an infinite loop, not halting. If it returns false, indicating that the example loops forever, then it in fact quickly returns nil and halts. So there's no real answer to whether this halts. And--it's not hard to see--there are (countably) infinitely many similar programs that exploit the same loophole.

When I first learned this, I thought, "This is cheating; that's too easy." Well, yes. That's higher level math for you. But there's another way of thinking about it, without that cheating device: to be able to tell if absolutely any piece of code halts, you need to know everything about that code and basically run it in some sort of emulator. There are cases where you can know it doesn't halt, but if it doesn't halt unexpectedly, and you're running the code... that doesn't work.

Building off of this, think about complexity: the first thing you need to know before assessing complexity is whether an algorithm halts. But we just showed that we can't do that! Also, what if the code does something like this:


function example2(n)
    if complexity(example2) = O(n) then
        return new integer[n][n]     // this takes O(n^2)
    else if complexity(example2) = O(n^2) then
        return new integer[n]        // this takes O(n)

and we want a tight bound for complexity.

Of course, specific solutions are possible, but these are both difficult to construct and highly limited in function. I hope you had fun with your daily dose of pessimism!

Monday, March 26, 2007

Dataflow in programming languages (with some Factor bias)

As I see it, there are three main ways to manage dataflow in a programming language. By this, I'm referring to how data passes both within a procedure. [The simplest one--visual boxes and arrows--is as easy to conceptualize as it is useless in practice, because it quickly gets unwieldy. So I'll ignore it for the purposes of this blog post.] Although it's often conflated in discussion, datflow pattern has nothing to do with other aspects of programming language paradigm. Data flow is often discussed in terms of functional vs. imperative, or applicative vs. concatenative, but this is a huge oversimplification. Advocates of one paradigm tend to claim that it is the best, for some reason or other, but in reality, all methods have applications where they excel and applications where they are pretty awkward.

I'll call the first dataflow method the mutation method. In this method, data is passed around within a function in named, mutable variables. Mutation is probably the oldest dataflow strategy because it directly corresponds to a register machine. So all early (50's and 60's) and most modern programming languages use it. The mutation method is probably what's most familiar to programmers: it's used in everything from C, Java, Basic, Ruby, Smalltalk and Python, to name a few. As an example, consider the definition of factorial in C:


int factorial(int x) {
    int n;
    for (n = 1; x>0; x--)
        n *= x;
    return n;
}

Update:Thanks to Dave for pointing out my extremely stupid errors in this code. It now works.
Though the Smalltalk version would look completely different, Smalltalk and C share the same fundamental property that the easiest way to pass around data in a relatively primitive loop is to mutate the value of variables in place. Self and Io may have some interesting semantics behind the getting and setting of variables, but in the end, it's all the same. The other two paradigms don't work like this. But not all programming languages using the mutation data flow are imperative. For example, Haskell, Lisp and ML can (all in different ways) emulate this sort of dataflow. In Common Lisp and Scheme, the value of any variable can be changed through setq and set!. In ML, mutable variables are implemented as an explicit reference. And in Haskell, there is the State monad and references through the I/O monad. I think it'd be possible to construct a purely functional language using this technique for everything.

The second way of doing things is what Lisp, Haskell and ML usually use: function application returning a value. I'll call this the application method. The first example I know of is John McCarthy's Lisp, created in 1958, a version of which is still in use today. Often, functional languages are defined in this way, but I think whether a language is functional or imperative has more to do with the use of data structures. In these languages, variables are (almost) always immutable, and loops are built up (at least in theory) through recursion. The main way a variable changes value is through function application: calling a function with different arguments "changes" the value of the variables. The same factorial function would be implemented in Haskell (without cheating and using any of Haskell's library or pattern matching) as


factorial n | n <= 0    = 1
            | otherwise = n*factorial (n-1)
-- of course, it could more easily be written as
-- factorial n = product [1..n]

Recursion can, of course, be used in mutation dataflow languages, but in application dataflow method, it is usually much more efficient, so it can be used in practical applications. In application values, there tend to be fewer "statements" and more "expressions" (though statements still usually exist for definitions and module inclusion). Within application languages, there are of course differences. In some languages, mutable variables can actually be used (see above), and different forms of recursion are efficient in different application languages. But the overall concept is the same.

The third form of dataflow is point-free (also known as pointless by its detractors). It's definitely the newest kind of data flow at only around 30 years old. I'd estimate (and I'm basing this on absolutely nothing) that there are fewer than 1000 people writing code in this way in the world at any time. Point-free refers to the lack of points, or variables, in most or all code. Instead of variables, data is passed implicitly between functions. There are two ways to do point-free programming, which could be called stack-based and function-based. The progenitor of function-based programming is the late John Backus. It's sad his obituaries mostly said, "He invented Fortran, which is amazing. Then, he did other stuff that nobody cares about for 50 years." In his 1977 Turing Award address, Backus introduced a new programming language which he called FP. In FP, the only way to make programs is to compose functions with the function composition operator o. There are also operators used to apply one value to more than one function, or to map a function over all elements of a list. The problem is, this gets really awkward when you have three or more arguments to a function, or want to do certain other things that are more complex. But this style lives on as a sublanguage of Haskell and J. In Haskell's point-free style, factorial might be defined as


factorial = product . enumFromTo 1

Though, of course, this is cheating, as it uses two library functions defined in an application style.

As I already mentioned, the other style of point-free programming is stack-based, as used in Forth, Joy, Postscript, HP calculator language and Factor. Forth was developed by Chuck Moore in the early 1970s as the next generation language, though it never really caught on as such. Then, around 10 years ago, Manfred von Thun noticed the correspondence between FP and Forth, and created a new language called Joy which combines the strengths. In Forth and Joy, as in Factor, all dataflow (outside of some global variables and arrays) takes place on the stack. The stack can be manipulated with words (the equivalent of functions) like dup, swap and drop in ways that would be impossible in function-based languages like FP. There is more flexibility for more data to be passed around in different ways and treated granularly rather than as a unit. In Factor, factorial is (ignoring the library functions that would make it easier, and ignoring the fact that an iterative version would be more efficient)


: factorial ( n -- n! )
    dup 0 <= [ drop 1 ]
    [ dup 1- factorial * ] if ;
! Of course a better way of doing it would be
! : factorial ( n -- n! ) 1 [ 1+ * ] reduce ;

In some ways, this is very similar to the application definition, once you can understand the postfix syntax. But the semantics are very different: rather than applying a function to its arguments, you are passing around a single stack to various words. Nearly all words only modify the top couple arguments on the stack, though.

But there are finer distinctions to make. Though Joy and Factor are both stack-based, code comes out looking very different between them. This is because, in Joy's implementation, the data stack is just a linked list which is not mutated by words. In Factor, the data stack is implemented more efficiently, and it is mutated. Though Joy is slower, there are a lot of cool things that come out of this. For example, in Factor, it is a common pattern to dup one or more items on or near the top of the stack in order to evaluate the predicate for a conditional (eg if). In Factor, if takes three arguments. The first argument, found third-to-top on the stack, is a boolean value (Factor works like Common Lisp in that anything but f is treated as true). The second-to-top argument is the quotation to be executed if that value is not f, and the top argument is the quotation to be executed if the value is f. In Joy, the third-to-top argument is not a boolean but rather a quotation. This bottom quotation is executed, and a boolean value is taken from the top of the stack and saved somewhere. Then, the old stack from before that execution is restored, and one of the two following quotations is executed. Joy also uses recursion combinators, taking advantage of this property to factor out the abstract behavior of many recursion patterns. Though this is all nice to work with, it comes at the cost of efficiency, simplicity and ease of implementation.

As you may have realized at this point, point-free languages span the whole spectrum from relatively low-level imperative (Forth, at least at the most basic level) to pure or almost-pure functional (Joy and FP, also Haskell and J when used in this style) to impure functional (Factor, Postscript, Onyx). They can also be statically typed (Cat, StrongForth), untyped (Forth) or dynamically typed (basically everything else). Basically, they can be anything.

So, I'm not sure if this essay actually proved its thesis--that all dataflow strategies have strengths and weaknesses and that dataflow is irrelevant of paradigm --but I hope I've enlightened you a little bit as to what's out there.

Oh, and one more thing. I recently saw a blog post that pretty much claimed Haskell as the modern incarnation of FP. Although Factor doesn't completely hide the Von Neumann machine underneath, code doesn't need to care about it in any real sense. As far as dataflow goes, Factor is much closer to FP than Haskell is, as they are both about composing functions. Just to make it official and unambiguous, I claim Factor as the One True Heir to FP.

Thursday, March 15, 2007

P=NP?

Warning: this is another one of those boring essays you won't want to read.

My mom always asks me, "if you know so much about computers, why don't you fix them?" I always explain, (1) I don't know that much about computers and (2) the interesting stuff is more theoretical, or at least stuff that's not directly used. So I tried to explain the great theoretical computer science problem of whether P=NP. I tried and I failed. So now I'm going to try my best to figure out how to explain this problem to the layperson, and my attempt is below. I think computer science like this is as useful to everyone as the basic chemistry and physics everyone learns in high school; even if you don't become a programmer, it's still good to know it.

Computers do things by executing a number of steps in order given a certain input. This is called an algorithm. One example of an algorithm is below, the algorithm to calculate factorial:


Step 0: Get the number that we are going to take factorial of, and store it as x.
Step 1: Set n equal to 1.
Step 2: If x=1, the answer is n. The algorithm is now finished.
Step 3: Multiply n by x and store the answer in n.
Step 4: Decrement x.
Step 5: Go back up to step 2.

Let's look at how this works. Try doing the factorial of 1. After going through Step 1, n=1 and x=1. Then, Step 2 makes the algorithm end, returning 1 as the result. If you use 3 as the value that we're taking the factorial of, the results of series of steps is as follows:


Step 0: x is now set to 3.
Step 1: n is now set to 1.
Step 2: This is skipped, because x does not equal 1.
Step 3: n is now set to 1*3=3.
Step 4: x is now set to 3-1=2.
Step 5: Go to step 2.
Step 2: This is skipped, because x does not equal 1.
Step 3: n is now set to 3*2=6.
Step 4: x is now set to 2-1=1.
Step 5: Go to step 2.
Step 2: x=1 so the program stops. n, which now equals 6, is the answer.

Obviously, this can get tiring quickly, which is why we have computers. But this algorithm will always return the right answer for a factorial for a number 1 or greater.

One thing that you might notice is that we can calculate the number of steps this will take: for any input x>=1, it takes 4x-1 steps. Since 4x-1 is a polynomial equation, we can say that this algorithm runs in polynomial time*. There are lots of different operations that computers do that run in polynomial time. In fact, almost everything is run in a number of steps that can be represented with a polynomial equation. The things that aren't run really slowly. The set of all algorithms that run in polynomial time is written as P, which, of course, stands for polynomial.

For the factorial algorithm, the two variables (n and x) only had one value at a time. But what if we said that a variable can hold more than one value at a time, and we would simultaneously see how things go for all the values? This doesn't make all that much sense, but let's look at a possible algorithm using this technique, an algorithm that factors an integer. This is key in cryptography, where the RSA encryption algorithm depends on this being difficult to do in a regular computer, where all variables only have one value at a time.


Step 0: Get the number you want to factor and store it in x.
Step 1: Set n to some integer between 2 and the square root of x.
Step 2: If x is evenly divisible by n, then n is a factor.

So this can find a factor of a number in just 3 steps. It's much faster than the regular way of doing it with a regular computer. This weird computer that can hold more than one value in a single variable (as done in Step 1) is called a non-deterministic computer, since it's not deterministic what value the variable will hold at any given time. It might be possible to do this algorithm in polynomial time on a regular computer, but computer scientists still don't know how.

Now, if you understand that, you're over the hardest part. The set of algorithms that can be done by this weird computer in polynomial time is called NP, which stands for non-deterministic polynomial. The question is, is there anything that we can do on the non-determistic computer in polynomial time that we can't do on the regular computer in polynomial time? Or is the set of algorithms on a regular computer in polynomial time the same as the set of algorithms on a non-deterministic computer in polynomial time? That is, does P=NP?

* Those of you know know about this stuff will tell me, "No, factorial is actually pseudo-polynomial. You're wrong, Dan. Go home." Well, you're right, but it's just for the sake of explanation. For the acute reader who wants to be confused, here's a more in-depth explanation: computers store things in bits of ones and zeros, and a number takes ceiling(log base 2(x)) bits to hold a number. For example, 7 takes 3 bits to hold in a computer. A real polynomial-time formula actually would be able to do its thing in a number of steps that can be represented by a polynomial formula of the size (for example, in bits) of the input. The factorial formula's number of steps is proportional to somewhere around 2^s, where s is the size of the number in bits. But it was just an example.

PS. If you have trouble understanding this, please leave me a comment.

Friday, March 2, 2007

Computers can't program themselves!

Note to coders: If you've done much in programming, you'll probably find this blog entry completely pointless and boring. You'd probably be well not reading it. Also, you may have noticed that my blog entries are way too long. This is no exception.

In 11th grade, I had an excellent economics teacher. He really got me interested in the topic, and helped me take the AP even though it wasn't an AP course. But the one thing that always annoyed me was the claim that, "By 2023, computing power in the world will exceed human brain power. This means that computers will program themselves!" I would always try to tell him, no, that's completely implausible, but could never convince him. When I mentioned this to a friend, he said, "They'll program themselves, but not until 2050." This view is far to common for me to resist refuting it here.

Computers programming themselves is at once impossible and already happening. It's already happening in the sense of compilers and libraries. Because of compilers, programmers don't write in ones and zeros that computers understand; they write in a language that describes things more directly, and then the computer "programs itself," converting that programmer-intelligible code into computer code. Similarly, libraries (reusable source code components) let programmers write drastically less. It's gotten so advanced that, to a large degree, programming is assembling preexisting components.

But it can't get much further than this. Text files for source code may disappear (I doubt it), but the programmer never will. An increase in computing power won't do it. All computers are basically Turing-complete*, so they can already perform every operation that any other computer could possibly perform. This may be at a slower rate, but the same amount of things are possible. No physical device can do more than what a Turing machine (or an equivalent machine, like the computer you're reading this on) can do. Even if, in 2023 or 2050, there is a trillion times more computing power than there is today, no new computations will be available; they'll just be faster. So if it will ever be possible for computers to program themselves any more than they already do, it's already possible. Someone just has to program it.

But advances in artificial intelligence or similar technologies won't make computers program themselves either. No matter what happens, computers still have to be told what to do. What they are told to do could be some complex behavior, such as behaving like a neural net (this isn't as useful as it sounds) or find rules and make decisions based on statistical analysis of a situation and past responses (this is a bit more useful). But the most useful operations have nothing to do with AI or learning. These include things like sorting a list, multiplying two matrices, and displaying a GUI window. These can only be done by a algorithmic process: directly or indirectly, someone has to tell the computer which element in the list to compare to which, what matrix elements multiply with what, or where to place the pixels in the window on the screen. Furthermore, effective programmers need to understand these algorithms, at least a little bit, in order to combine them in an efficient way. There is no way to avoid this eventuality. The best we can do is build detailed, useful libraries, so code only has to be written once, and make a good, high-level programming language so programmers can express their ideas in a simple, logical way.

This isn't to say that programming will always stay the way it is now. Programming in modern languages like Java, Haskell or Factor today is radically different from programming in C or assembler 25 years ago**, and from punch cards 25 years before that. In 25 years, things will be very different, possibly even with new input devices (such as graphical programming or voice commands; these have problems too, but that's another blog post). But computers won't ever program themselves as long as they exist; there will always be someone who has to directly tell them what to do.

As an author, I'm not very persuasive. So to anyone I haven't convinced, I will extend a bet: for any amount, in any finite time frame, I bet you that computers will not program themselves.

* Actually, only computers with an infinite supply of memory are truly Turing-complete, because there is an infinite number of possible algorithms. But in practice, computers can do pretty much whatever we want them to do, and this ability will not change significantly with time.

** Those languages are still used today, but not usually used as the primary language for constructing an application.