Generalizing the fibonacci sequence

In response to Daily Programmer Challenge #71 intermediate.

You’ve probably seen the classic Haskell one-liner:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Let’s generalize it to work with this problem. Since I chose to use Integers everywhere, I’ll need lots of genericFoo from Data.List.

> import Data.List

Now first let’s generalize zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]. The zipping function, instead of taking 2 inputs, will take K inputs. Then, instead of giving it 2 lists, we will give it K lists. It will be slightly less general, in that all K inputs will have the same type a, rather than differing types a and b.

Let’s use a list of length K to encode this sort of input. Therefore, (a -> b -> c) becomes (List K a -> c), and [a] -> [b] -> [c] becomes List K [a] -> [c]. However, I won’t actually bother encoding how long the list is into the type system, so it’ll just be [a] -> c and [[a]] -> [c] respectively.

I will implement it by taking in some function f, and some list xss. The first entry of the resultant list will be the result of applying f to all the first entries of xss, and so forth:

listZip :: ([a] -> b) -> [[a]] -> [b]
listZip _ []  = []
listZip f xss
  | null (head xss) = []
  | otherwise = f (map head xss) : listZip f (map tail xss)

Actually, there’s an easier way to implement it, using Data.List:

> listZip :: ([a] -> b) -> [[a]] -> [b]
> listZip f = map f . transpose

Now, I must generalize (+) to work on lists. The obvious generalization is sum. I’m making one additional tweak, which is to calculate the sum modulo M.

> sumMod :: Integer -> [Integer] -> Integer
> sumMod m = foldl' (\x y -> (x + y) `rem` m) 0

The generalization of tail is already written for me: it is tails from Data.List. Now to generalize the rest of fibs. I’ll parameterize it by M (the modulus) and K (as described earlier), as follows:

> fibSeq :: Integer -> Integer -> [Integer]
> fibSeq m k = fibs
>  where
>   fibs = genericReplicate (pred k) 0 ++
>          1 :
>          (listZip (sumMod m) $ genericTake k $ tails fibs)

From here the desired function f as specified in today’s problem is simple:

> fibSeqAt :: Integer -> Integer -> Integer -> Integer
> fibSeqAt m k n = fibSeq m k `genericIndex` n

This code therefore works by lazily constructing the Kth Fibonacci sequence (modulo M), and then inspecting its Nth element. Modular arithmetic assures that aggressive truncation still preserves the same truncated sum.

Testing:

ghci> mapM_ (print . take 20 . fibSeq 100) [1 .. 5]  
  [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
  [0,1,1,2,3,5,8,13,21,34,55,89,44,33,77,10,87,97,84,81]
  [0,0,1,1,2,4,7,13,24,44,81,49,74,4,27,5,36,68,9,13]
  [0,0,0,1,1,2,4,8,15,29,56,8,8,1,73,90,72,36,71,69]
  [0,0,0,0,1,1,2,4,8,16,31,61,20,36,64,12,93,25,30,24]

ghci> fibSeqAt (10^8) 100 10000  
  93981304

This solution is still too slow, however, to reasonably compute fibSeqAt (10^8) (3^13) (5^10).

You can play with this code yourself by downloading fib.lhs.

Posted in Uncategorized | 1 Comment

Rule 110

The Challenge

The one-dimensional simple cellular automata Rule 110 is the only such cellular automata currently known to be turing-complete, and many people say it is the simplest known turing-complete system.

Implement a program capable of outputting an ascii-art representation of applying Rule 110 to some initial state. How many iterations and what your initial state is is up to you!

You may chose to implement rule 124 instead if you like (which is the same thing, albeit backwards).

Bonus points if your program can take an arbitrary rule integer from 0-255 as input and run that rule instead!

/r/dailyprogrammer challenge #72 easy

My Response

Time for some knot-tying!

Bits

Well, first, let’s start off with a simple datatype to represent "bits".

> data Bit = I | O
> 
> fromChar :: Char -> Bit
> fromChar '1' = I
> fromChar '0' = O
> fromChar _ = error "Bits can only be 1 or 0"
> 
> toChar :: Bit -> Char
> toChar I = '1'
> toChar O = '0'
> 
> instance Show Bit where
>   show b = [toChar b]
>   showList bs s = map toChar bs ++ s

Cells and CellLoops

OK, now a type to represent a cell. Cells have two neighbors and a value.

> data Cell = Cell { cellPrev :: Cell, cellVal :: !Bit, cellNext :: Cell }

Computations involving a cell’s bit value should be straightforward, so I’ve made that field strict. The neighbor fields, however, will need to be lazy in order to tie the knot as we shall soon see. Basically, I want this to be a circular doubly-linked list. But we need to be able to have some notion of when we have gone all the way around the "loop", so we’ll wrap up our cells in another data type to keep our bearings:

> data CellLoop = CellLoop { loopStart :: Cell, loopLength :: !Int }

A CellLoop chooses a definitive "starting point" cell, and contains the "length" of the loop.

Creating a CellLoop from a list of Bits

Now, given a list of Bits, we want to be able to create a CellLoop. We’ll do that by tying the knot like so:

> fromList :: [Bit] -> CellLoop
> fromList [] = error "Can't create an empty CellLoop"
> fromList bs =
>   let (this, last) = fromList' bs last this
>   in CellLoop this (length bs)
> 
> fromList' :: [Bit] -> Cell -> Cell -> (Cell, Cell)
> fromList' [] prev first = (first, prev)
> fromList' (x:xs) prev first =
>   let this = Cell prev x next
>       (next, last) = fromList' xs this first
>   in (this, last)
> 
> fromString :: String -> CellLoop
> fromString = fromList . map fromChar

fromList’ takes three inputs:

  • the list of bits
  • the "previous" cell of the completed loop
  • the "first" cell of the completed loop

It has two outputs: the "first" and "last" cells of the completed loop.

In the base case, you can see that it simply regurgitates its inputs. In the interesting case, this and next are defined with mutual recursion, and letrec magic ties them together.

Converting back

Converting back to a list of bits is much easier, we just use the length that we stored as "fuel", and when the fuel runs out, we stop.

> toList :: CellLoop -> [Bit]
> toList (CellLoop c i) = toList' c i
> 
> toList' :: Cell -> Int -> [Bit]
> toList' _ 0 = []
> toList' (Cell _ x next) i = x : toList' next (pred i)

Now, we actually want a CellLoop to display a little differently than just a list of Bits, so we’ll make a show instance accordingly:

> instance Show CellLoop where
>   show = map toChar' . toList
>     where
>       toChar' I = '*'
>       toChar' O = ' '

Evolution

Now for the final hurdle: evolution. We’d like to write a function evolve :: Rule -> CellLoop -> CellLoop. In order to do so, we’ll use both of the tricks we used previously: tying the knot, and fuel.

> type Rule = Int
> 
> evolve :: Rule -> CellLoop -> CellLoop
> evolve r (CellLoop c i) =
>   let (this, last') = evolve' r c i last' this
>   in (CellLoop this i)
> 
> evolve' :: Rule -> Cell -> Int -> Cell -> Cell -> (Cell, Cell)
> evolve' _ _ 0 prev' first' = (first', prev')
> evolve' r c i prev' first' =
>   let newVal = evolveCellVal r c
>       this = Cell prev' newVal next'
>       (next', last') = evolve' r (cellNext c) (pred i) this first'
>   in (this, last')
> evolveCellVal :: Rule -> Cell -> Bit
> evolveCellVal r (Cell prev x next) =
>   lookupRule r (show [cellVal prev, x, cellVal next])
> 
> -- currently ignores input rule and uses Rule 110
> lookupRule :: Rule -> String -> Bit
> lookupRule _ str = case str of
>   "111" -> O; "110" -> I; "101" -> I; "100" -> O
>   "011" -> I; "010" -> I; "001" -> I; "000" -> O

Since a Cell always knows about its neighbors, the computation of the evolved cell value can be completely separate from the code that traverses and reconstructs the CellLoop structure.

It should be straightforward, given a technique to turn an integer into a list of bits, to parameterize evolveCellVal (and by extension, evolve) on any rule. This is left as an exercise to the reader.

Play time

Let’s write a little helper to aid us in playing with what we’ve got:

> runRule :: Rule -> Int -> String -> IO ()
> runRule r i s = mapM_ print $ take i $ iterate (evolve r) $ fromString s

Let’s play a little bit and see how it goes:

ghci> runRule 110 5 "100100100"  
  *  *  *  
  * ** ** *
  *********
           
           

ghci> runRule 110 10 "0000011111000000"  
       *****      
      **   *      
     ***  **      
    ** * ***      
   ******* *      
  **     ***      
  **    ** *     *
   *   *****    **
  **  **   *   ***
   * ***  **  **  

ghci> runRule 110 10 "10000000101000000001"  
  *       * *        *
  *      ****       **
  *     **  *      ** 
  *    *** **     ****
  *   ** ****    **   
  *  *****  *   ***  *
  * **   * **  ** * **
  ****  ***** ******* 
  *  * **   ***     **
  * *****  ** *    ** 

You can play with this code yourself by downloading rule110.lhs.

Posted in Uncategorized | Leave a comment

Hello world!

I’m setting up this wordpress blog in order to provide a nicer way to view my github blog [1] using BlogLiterately [2].

  1. https://github.com/DanBurton/Blog
  2. http://byorgey.wordpress.com/2012/07/02/blogliterately-0-4-release/
Posted in Uncategorized | Leave a comment

Why GADTs are awesome: implementing System F using HOAS

Background

As an exercise while reading through Types and Programming Languages, I decided to implement an interpreter and typechecker for System F, using HOAS (Higher-Order Abstract Syntax) and Haskell’s GADTs (Generalized Algebraic Data Types). There were some really cute tricks that made the implementation fairly simple, so I decided to blog about it.

> {-# LANGUAGE GADTs, KindSignatures, FlexibleInstances, EmptyDataDecls #-}
> import Prelude hiding (succ)
> import Control.Applicative
> import Control.Monad

I’ve left the module declaration out of this version for the sake of space. See http://github.com/DanBurton/system-f for the module-ized version with few comments.

Preliminaries

I’ve provided a few synonyms around the Either type, in the event that I might want to change the error handling in the future. Basic stuff, really.

> type ErrOr a = Either String a
> good :: a -> ErrOr a
> good = Right
> err :: String-> ErrOr a
> err = Left
> isGood :: ErrOr a -> Bool
> isGood Right{} = True
> isGood Left{}  = False

This next function is a bit of an abombination. I promise I won’t abuse it too much.

> get :: ErrOr a -> a
> get (Right x) = x
> get (Left e)  = error e

Data types

The language should provide some primitives. Here I’ve provided Num primitives which correspond to natural numbers, sort of. Primitive doesn’t just have kind *, it has kind * -> *, meaning primitives are parameterized over some Haskell type. More on this later.

> data Primitive :: * -> * where
>   Num :: Integer -> Primitive Integer
>   Succ :: Primitive (Integer -> Integer)

In retrospect, it is slightly confusing that I named one of these constructors "Num". This is a value, so Haskell doesn’t confuse it with the typeclass of the same name. This might cause problems if I turned on some of the extra kind-wankery GHC provides.


Types are a little more interesting. In System F, Types are also "values" in the language: you apply type literals to type abstractions.

There are three types that I provide here: the Num type, the Function type, and the Type Abstraction type. I use V as a pun for the "forall" symbol, which represents type abstractions. If additional primitives were added, then this could be extended with relative ease (e.g. CharTy, ListTy, etc).

Notice how Types are also parameterized on Haskell types. We’ll talk about that soon… promise!

> data Type :: * -> * where
>   NumTy :: Type Integer
>   FunTy :: Type a -> Type b -> Type (a -> b)
>   VTy :: (Type a -> Type b) -> Type (V a b)
>   TyVar :: Char -> Type a

Now, obviously I can’t count, because I said there were three types of types, but in fact there are four. Well the fourth (TyVar) is used purely as a hack for the purpose of printing and equality testing. Let’s check those out:

> instance Eq (Type a) where
>   (==) = eqTy (['A' .. 'Z'] ++ ['a' .. 'z'])

Here we are about to define equality on types based on a helper function that takes a list of Chars. A "forall" type is represented by a Haskell function from Type to Type, so whenever we hit one of those, we will just pull a Char out of the list and create a TyVar to hand into that function. By handing the same Char to two different VTys, we will be able to see whether or not the functions are equal. There are some underlying assumptions going on here, primarily that you are not constructing types using strange means.

> eqTy :: [Char] -> Type a -> Type a -> Bool
> eqTy _ NumTy NumTy = True
> eqTy cs (FunTy dom rng) (FunTy dom' rng') = eqTy cs dom dom' && eqTy cs rng rng'
> eqTy (c:cs) (VTy f) (VTy f') = eqTy cs (f (TyVar c)) (f' (TyVar c))
> eqTy [] _ _ = error "Congratulations, you've used up all of the characters. Impressive."
> eqTy _ (TyVar c) (TyVar c') = c == c'
> eqTy _ _ _ = False

The Show instance is much the same. I prefer my type variables to be X, Y, and Z when possible, so the list of Chars starts with those. Whenever we need to print out a Forall’d type, we just pick a Char from the list, and use that TyVar hack.

> instance Show (Type a) where
>   show = showTy ("XYZ" ++ ['A' .. 'W'])
> showTy :: [Char] -> (Type a) -> String
> showTy _ NumTy = "Num"
> showTy cs (FunTy dom rng) = "(" ++ showTy cs dom ++ " -> " ++ showTy cs rng ++ ")"
> showTy (c:cs) (VTy f) = "(∀ " ++ [c] ++ ". " ++ showTy cs (f (TyVar c)) ++ ")"
> showTy [] VTy{} = error "Too many nested type applications"
> showTy _ (TyVar t) = [t]

Here’s a lonely little line of code. This is the entire reason for the EmptyDataDecls pragma. The Haskell type of a System F function abstraction is parameterized on a Haskell type (as we will soon see). But type abstractions are not parameterized by any pre-existing Haskell type. Instead, they are parameterized on this V type.

> data V a b

Now for the exciting part, terms! GADTs really shine here. (Well, they shine for Types too).

There are five kinds of terms in System F: primitives, function abstractions, function applications, type abstractions, and type applications. Look very carefully at the type signature for each of these constructors.

A Primitive parameterized on Haskell type a turns into a Term parameterized on Haskell type a.

An Abstraction requires you to declare the input Type and to provide a function from Term to Term. Notice how these are parameterized on types a and b in a nifty way.

An Application requires a Term parameterized on a function from a to b, and a Term parameterized on a, and creates a term parameterized on b.

Type abstractions and applications are similar to those of functions. The only differences are the use of the V type, and the type abstraction does not require you to specify the type of the input. (This implementation of System F does not support bounded quantification) Now go back and review the type signatures for constructors of Type. Makes sense, right?

> data Term :: * -> * where
>   Prim :: Primitive a -> Term a
>   Abs :: Type a -> (Term a -> Term b) -> Term (a -> b)
>   App :: Term (a -> b) -> Term a -> Term b
>   TAbs :: (Type a -> Term b) -> Term (V a b)
>   TApp :: Term (V a b) -> Type a -> Term b
>   Unknown :: Char -> Term a

There’s also that nifty little extra constructor: Unknown. This is a hack analogous to TyVar; we’ll see how they play together later.

Now for another cute little trick: getting ghci to also be the repl for our little System F language. All you have to do is provide a Show instance for Terms that evaluates its argument and discovers its type.

> instance Show a => Show (Term a) where
>   show t = let v = get $ eval t
>                ty = get $ typeOf t
>            in show v ++ " : " ++ show ty

Using a GHC < 7.4, this Eq instance is required for what follows. Implementing Eq on terms could be done in similar fashion to Eq on Types, but I was too lazy to do it.

> instance Eq (Term Integer) where
>   (==) = undefined

Here’s a related trick to make our System F even more convenient, by defining fromInteger = num for Term Integer, we can now use numeric literals as primitive values in our System F language. num is defined near the end, along with a few other language conveniences.

> instance Num (Term Integer) where
>   fromInteger = num
>   (+) = undefined
>   (-) = undefined
>   (*) = undefined
>   abs = undefined
>   signum = undefined
>   negate = undefined

Evaluation

Now for evaluation. The code is almost insanely simple. First, let’s define eval', which reduces terms as much as possible, using a big-step approach.

> eval' :: Term a -> ErrOr (Term a)
> eval' (Prim p)   = good $ Prim p
> eval' (Abs t f)  = good $ Abs t f
> eval' (TAbs f)   = good $ TAbs f
> eval' (App f x)  = do
>   f' <- eval' f
>   res <- runApp f' <*> eval' x
>   eval' res
> eval' (TApp f x) = do
>   f' <- eval' f
>   res <- runTApp f' <*> pure x
>   eval' res

Here for function and type applications, we rely on helper functions runApp and runTApp respectively, which will either hit an error, or produce an actual Haskell function.

We’ll also define a "full eval" function, which creates an actual Haskell value out of evaluating a Term (or produces an error).

> eval :: Term a -> ErrOr a
> eval t = eval' t >>= valueOf

Here I’ve only provided a way to extract an Int. Functions are left as an exercise to the reader.

> valueOf :: Term a -> ErrOr a
> valueOf (Prim n) = fromPrim n
> valueOf _ = err "Not a value"
> fromPrim :: Primitive a -> ErrOr a
> fromPrim (Num n) = good n
> fromPrim _ = err "fromPrim failed unexpectedly"

> runApp :: Term (a -> b) -> ErrOr (Term a -> Term b)
> runApp (Abs t f) = good f
> runApp (Prim p) = runAppPrim p
> runApp _ = err "unexpected non-abstraction used in application"

runApp is simple. If it is a function abstraction, just grab the Haskell function that was already provided. If it is a primitive, then provide a primitive implementation.

> runAppPrim :: Primitive (a -> b) -> ErrOr (Term a -> Term b)
> runAppPrim Succ = good $ \(Prim (Num n)) -> num (n + 1)

Here’s something really cute about GADTs. At first I had another case, runAppPrim _ = err "blah blah" which followed the Succ case. But GHC warned me about overlapping patterns! Since I gave this function the type Primitive (a -> b) -> blah, GHC knows that the Num is not a possibility.


There are no primitives that are type abstractions, so runTApp is even simpler than runApp.

> runTApp :: Term (V a b) -> ErrOr (Type a -> Term b)
> runTApp (TAbs f) = good f
> runTApp _ = err "runTApp failed unexpectedly"

Typing

Now for another fun part, type reconstruction! Given a Term, we want to discover its Type. But here’s something really cool: the Term and the Type have to be parameterized over the same Haskell type! Basically, Haskell’s type checker will prevent me from writing my type checker incorrectly.

I think it would actually be safe to remove the Error wrapping around the result of this funciton, and always assume that it is correct, because the Haskell type checker should prevent you from constructing an ill-typed Term in the first place.

> typeOf :: Term a -> ErrOr (Type a)
> typeOf (Prim p)  = good $ primType p
> typeOf (Abs t f) = FunTy t <$> typeOf (f (genTy t))
> typeOf (TAbs f)  = good $ VTy (\x -> get $ typeOf (f x))
> typeOf (App f x) = do
>   FunTy dom rng <- typeOf f
>   t             <- typeOf x
>   if (t == dom)
>     then good rng
>     else err "function domain does not match application input"
> typeOf (TApp f x) = do
>   VTy f' <- typeOf f
>   good (f' x)
> typeOf (Unknown c) = good $ TyVar c

The types of primitives are predetermined

> primType :: Primitive a -> Type a
> primType Num{} = NumTy
> primType Succ  = FunTy NumTy NumTy

Now even more fun! In order to determine the type of a function abstraction, we need to be able to inspect the "body" or "result" of that function. However, since it is represented as a Haskell function, it is opaque to us! Or is it? All we really have to do is give it some Term, any Term, of the correct input type, and then look at the type of the result.

So all we have to do is, given a Type, generate a Term of the correct Type. Can we actually do that? Well, sure! Check it out:

> genTy :: Type a -> Term a
> genTy NumTy = num 0
> genTy (FunTy dom rng) = l dom (\_ -> genTy rng)
> genTy (VTy f) = TAbs (\x -> genTy (f x))
> genTy (TyVar c) = Unknown c

First, notice the interplay between genTy and typeOf when it comes to Unknown and TyVar. An Unknown c has type TyVar c, and to get a value of TyVar c, just create an Unknown c! Cute.

More seriously, this function is a testament to the awesomeness of GADTs. Check this out:

num 0 :: Term Integer
l foo (\_ -> genTy bar) :: Term (Foo -> Bar)

(l = Abs, see below) genTy cannot possibly be well-typed, because these two expressions have entirely different (and entirely concrete, non-polymorphic) types!

… and yet it is, and this is the real magic of parameterizing both Types and Terms on Haskell types. Since NumTy is parameterized on Integer, that means that the result of genTy NumTy must be a Term Integer'. But sinceFunTy foo baris parameterized onFoo -> Bar, that means that the result ofgenTy (FunTy foo bar)must be aTerm (Foo -> Bar)`. So genTy, which would otherwise be impossible to type, is, in fact, well typed! All thanks to (quite natural) use of GADTs and Haskell’s sexy types.

tl;dr – parametric polymorphism + GADTs = awesomesauce

Language primitives

These are just a few "primitives". Here I use the term "primitive" to mean "you should actually write System F expressions using these". Although with the Num typeclass hack, num should be unnecessary.

> num = Prim . Num
> succ = Prim Succ
> v = TAbs
> l = Abs
> app = App
> tapp = TApp

Basic testing functions

Let’s play around, defining a couple functions in System F. You’ll notice how verbose it is to perform function and type applications. Brownie points to you if you write some Template Haskell quasi-quoting that can prettify this. (Contribute it to the github repo linked at the top!)

A simple function that simply applies the primitive succ to its input

> succ' = l NumTy (\x -> app succ x)

The identity function. Given a type and an input of that type, reproduce the input.

> id' = v (\t -> l t (\x -> x))

The const function. Given two types, and two inputs of those types, reproduce the first.

> const' = v (\t1 -> v (\t2 -> l t1 (\x -> l t2 (\_ -> x))))

Given a function from X -> X, produce the function that performs the original function twice on its given input.

> twice = v (\t -> l (FunTy t t) (\f -> l t (\x -> (app f (app f x)))))

Use the twice function on itself!

> fourTimes = v (\t -> app (tapp twice (FunTy t t)) (tapp twice t))

Example usage:

ghci> app (app (tapp twice NumTy) succ) 0  
  2 : Num

I wish this could be written in a more System F style:

[| twice NumTy succ 0 |]

Here’s a cool thing to check out. Go into ghci and try out the following:

ghci> :type const'  
  const' :: Term (V b (V a (b -> a -> b)))

ghci> typeOf const'  
  Right (∀ X. (∀ Y. (X -> (Y -> X))))

Cool, huh? The inferred Haskell type really captures a lot of the meaning.

Posted in Uncategorized | Leave a comment

Using monadic effects to reverse a merge sort

One of the Facebook Hacker Cup (2012) challenges in Round 1 was, given a list’s length "n", and a series of "choices" taken by a typical merge sort algorithm, discover the original list. You additionally make the assumption that the original list was a permutation of [1 .. n].

> {-# LANGUAGE BangPatterns #-}
> 
> import Control.Applicative
> import Control.Monad
> import Control.Monad.Writer.Lazy
> import Control.Monad.State.Lazy
> import System.IO
> import Text.Printf (printf)

I started out by playing around with the idea of a "monadic" merge sort, a la sortBy. The difference is that the comparator you pass in is allowed to perform an arbitrary monadic effect. I gave this concept a type synonym, for clarity. An MComparator is a function that takes in two things of the same type, and then produces a monadic action that yields an Ordering.

> type MComparator m a = a -> a -> m Ordering

Writing sortByM was trivial. In order to get the correct answer, I had to mimick the merge sort pseudocode provided by Facebook, which was pretty standard. The only thing that makes this Haskell code strange is that it is necessarily monadic, since it permits the comparator to execute arbitrary monadic effects.

> sortByM :: (Monad m, Functor m) => MComparator m a -> [a] -> m [a]
> sortByM cmp []  = return []
> sortByM cmp [x] = return [x]
> sortByM cmp xs = do
>   let (ys, zs) = partition xs
>   ys' <- sortByM cmp ys
>   zs' <- sortByM cmp zs
>   merge ys' zs'
>   where merge [] bs = return bs
>         merge as [] = return as
>         merge (a:as) (b:bs) = do
>           comparison <- cmp a b
>           case comparison of
>             LT -> (a:) <$> merge as (b:bs)
>             _  -> (b:) <$> merge (a:as) bs
>         partition xs = splitAt (length xs `quot` 2) xs

Now then, we have a lovely higher-order function. Let’s play! We need some MComparators to feed to this baby, and I immediately had two ideas.

  1. Mimick the Facebook code, and record the choices made by a regular comparison sort. This simply screams Writer monad.
> spewCmp :: (a -> a -> Ordering) -> MComparator (Writer [Choice]) a
> spewCmp cmp x y = case cmp x y of
>   LT -> tell [LeftFirst] >> return LT
>   x -> tell [RightFirst] >> return x
  1. Completely ignore the inputs, and perform the "sort" based on a given list of choices.
> givenCmp :: MComparator (State [Choice]) a
> givenCmp x y = do
>   (c:cs) <- get
>   put cs
>   return $ toOrdering c

The State monad makes this really elegant. Notice how, with this approach, I don’t have to tell the sort algorithm how many choices to take from the list when it enters a sub-branch. Since it all runs in the same state monad, it Just Works.

givenCmp isn’t the safest function (what happens when you run out of choices?) but hey, I’m just playing around. As a side note, I made a simple data structure to represent a "choice" made by the merge sort algorithm, which you may have noticed in the code above. Nothing too special.

> data Choice = LeftFirst | RightFirst deriving (Eq)
> 
> instance Show Choice where
>   show c = [fromChoice c]
> 
> fromChoice LeftFirst = '1'
> fromChoice RightFirst = '2'
> 
> toChoice '1' = LeftFirst
> toChoice '2' = RightFirst
> 
> toOrdering LeftFirst = LT
> toOrdering RightFirst = GT

Still not too serious yet, I figured I’d try asserting things about how these two different monadic sorts behave.

If you record the steps of a regular compare sort, and then perform a "sort" immitating those steps, you will get the same result both times.

Also, you will consume exactly the same number of choices, so the end state of the latter computation will be an empty list of choices.

> prop_sorty :: [Int] -> Bool
> prop_sorty xs = null s && xs' == xs''
>   where (xs', log) = runWriter $ sortByM (spewCmp compare) xs
>         (xs'', s)  = flip runState log $ sortByM givenCmp xs

Well this was all good fun, but it’s about time I started actually solving the problem. At first, I tried to read the specified list of choices backwards, and reconstruct the original list by "unsorting" the sorted list. This lead to much pain and agony.

Then, I had an epiphany. I already had all the tools I needed to solve this problem in a very elegant way. Let’s define some helpers to illustrate more clearly: these are just convenience wrappers around the cruft of running Writer and State.

> -- given a list of merge choices, runs those mergesort steps on a list
> runChoices :: [Choice] -> [a] -> [a]
> runChoices cs = flip evalState cs . sortByM givenCmp
> -- sorts a list the usual way, making note of the choices made at each step
> recordChoices :: Ord a => [a] -> [Choice]
> recordChoices = execWriter . sortByM (spewCmp compare)

Note that runChoices doesn’t even require type "a" to have an Ord instance! Also note that we discard the uninteresting parts of running State and Writer by using evalState (keep result, discard state) and execWriter (keep log, discard result) respectively.

Now consider one particular example given by Facebook: n = 4, choices = 12212, solution = [2,4,3,1]. Let’s play around in ghci a little bit.

ghci> recordChoices [2,4,3,1]                          --   
  [1,2,2,1,2]

ghci> runChoices (recordChoices [2,4,3,1]) [1,2,3,4]   --   
  [4,1,3,2]

There are a couple things to notice about this. First of all, the mapping. Compare [2,4,3,1] to [1,2,3,4]. 2 maps to 1, 4 maps to 2, 3 maps to 3, and 1 maps to 4. Now compare [1,2,3,4] with [4,1,3,2]. 1 maps to 4, 2 maps to 1, 3 maps to 3, 4 maps to 2. Notice anything? It’s the same mapping. Given the mapping from [1,2,3,4] to [4,1,3,2], we should be able to reverse it, and then apply the reverse map on [1,2,3,4] in order to recover the original!

But I actually solved it slightly differently. You see, [4,1,3,2] can be transformed back into [1,2,3,4] by sorting it, right? We arrived at [4,1,3,2] by applying the choices of sorting [2,4,3,1] onto [1,2,3,4]. So moving from [1,2,3,4] to [4,1,3,2] represents moving from the original list to [1,2,3,4], since we made the same choices in both cases. So now what about moving back?

If we record the choices made moving back from [4,1,3,2] to [1,2,3,4] (which can be done by sorting!) then the same choices can be used to "move back" from [1,2,3,4] to the original list!

> solvePerm :: Int -> [Choice] -> [Int]
> solvePerm n cs = runChoices (recordChoices (runChoices cs xs)) xs
>   where xs = [1 .. n]

Of course, we must write a quickCheck property to make it legit. If you record the steps of a regular compare sort, then you can use solvePerm to get the "correct answer". To prove it is correct, you record the choices of compare sort on that and observe that they are the same.

> prop_solved :: [Int] -> Bool
> prop_solved xs = cs == cs'
>   where cs  = recordChoices xs
>         cs' = recordChoices $ solvePerm (length xs) cs

The particular way I wrote this skirts around an issue with using quickCheck: we should actually be generating [Int] such that it is a permutation of [1 .. n]. I avoid caring about this detail by taking a list of choices to be a sort of "signature" for a given permutation. Thus, any random list "represents" a given permutation, because it produces a particular list of choices.

Well, there you have it. The rest of the problem is just boilerplate, and if any of you are using Haskell for the Facebook Hacker Cup, then feel free to steal this main method boilerplate, which I have used with only light modification on pretty much every problem:

> inFileName = "recovery-in.txt"
> outFileName = "recovery-out.txt"
> 
> readInt :: String -> Int
> readInt = read
> 
> zipSelf :: [a] -> [(a,a)]
> zipSelf (x:y:xs) = (x,y) : zipSelf xs
> zipSelf _ = []
> main = do
>   contents <- words <$> readFile inFileName
>   outFile <- openFile outFileName WriteMode
>   let n = readInt (head contents)
>       inputs = zipSelf $ tail contents
>   forM_ ([1 .. n] `zip` inputs) $ \(c, (num, choices)) -> do
>     let ans = checksum . solvePerm (readInt num) . map toChoice $ choices
>     hPutStrLn outFile $ printf "Case #%d: %d" c ans
>   hClose outFile

Minor detail, the checksum, as specified by Facebook for this problem. Can be written more succinctly with foldl’.

> checksum :: [Int] -> Int
> checksum = go 1
>   where go !res [] = res
>         go !res (x:xs) = go ((31 * res + x) `rem` 1000003) xs
Posted in Uncategorized | Leave a comment

Hope for Haskell

Hope for Haskell

Making Hackage and cabal-install more awesome

Haskellers, I’d like to know your thoughts about something.

As an aspiring GSOC student, I’ve been contemplating the current state of cabal-install and hackage. These tools can be very convenient, and have been an great source of joy and awesomeness to the Haskell community. However, there are subtletries and pains to using these tools. It can be hard for hobbyist Haskellers to understand how to deal with a failed cabal install foo, and it can be hard for “serious” Haskellers to select an appropriate library for their pressing needs.

So I’ve been thinking. The Haskell community could benefit from two things:

  1. a simpler cabal-install that “just works” or else tells you plainly and up front that what you are trying to do won’t work, and
  2. an extended Hackage, that gives you a better idea about the quality of a given library, and guides you to good libraries.

Let’s bundle these two concepts together (a new command line tool built on top of cabal-install, and a new website built on top of hackage) and give them the code name “Hope”. You could think of Hope (partially) as being a sort of crowd-sourced Haskell Platform.

The reason I suggest to bundle these tools is because the command line tool can take advantage of the “additional information” on the website in order to make intelligent decisions.

My ideas about Hope on the command line

  • phone home automatically
    • Report what OS, GHC version, etc, are being used
    • collect statistics, bugs
    • use this data to download versions of packages that are known to work on that setup
    • if compiled from source, upload the resulting binaries
  • provide an “uninstall” option
    • keep track of installs
    • don’t uninstall portions needed by other installed packages, print message but keep it simple
  • provide an “upgrade” option
    • uninstall the old and then reinstall the new (?)
    • easy way to upgrade your GHC and reinstall all your libraries
  • notify upfrontif certain system libraries are needed (*.dll, *.so, alex, curl, etc)
    • this will require additional information, probably gathered from the website
  • explore other interesting possibilities
    • faster version of “cabal update” based on diffs, hashes
    • torrent package distributions
    • distribute pre-compiled binaries

My ideas about Hope on the web

  • user-submitted ratings
  • automated reports: does it pass hlint, compile without warnings, meet certain style guides?
  • cascading ratings: you are only as good as your dependencies
  • encourage a unified release cycle, rate activity of package
    • don’t just grab the latest from Hackage until it has been proven to work correctly with things that depend on it.
  • require a certain level of documentation to even show up on Hope
  • rate levels of documentation, present well-documented libraries more prominently
  • integrate Hayoo (Hoogle for Hackage)
  • note assumptions about package (assumptions propagate to anyone depending on you)
    • cannot be installed alongside package X
    • must have certain libs or executables installed
    • unix / Windows / Mac only
    • assumptions/dependencies given in the .cabal file

Basically Hope on the web would be an enhanced Hackage with a certain level of human intervention. Hard to say how much is the “right amount”, but there should at least be some design work and scavaging done manually in order to present a sexy website, perhaps showcasing high-quality libraries. I’m also contemplating mandatory progress reports from listed maintainers, or something like that.

The aim of Hope is several-fold. Primarily, it aims to just plain make Hackage and cabal-install better. Note it does not aim to replace Hackage and cabal-install, rather, to use and build off of them. Another major goal is to encourage communication. Rather than the one-way stream of communication that is currently the detault: “upload to hackage -> download with cabal-install”, information and sharing automatically flows both ways: “this build worked for me”, download statistics, user ratings, etc. Also, unified release cycles and mandatory progress reports could help keep the dialogue flowing between upstream and downstream libraries, as well as users. If all goes according to plan, then Hope will make Haskell an even more attractive language. Hope eliminates complaints that Haskell is not as easy as pip/gem/npm, and combats criticism of hit-or-miss library experiences. Haskell should have Hope for a bright future.

But what do you think? Would it be too much work? Too many features? Too little benefit? Too elitist? Name too cheesy? This is almost certainly more work than a single GSOC could produce, but I think in the long run Hope is the way to go. Please comment, in order of preference, on the reddit post or my google+ post.

Posted in Uncategorized | 3 Comments