Lecture 23 – Implementing A Simple Programming Language (PCF)
1. Language PCF
Programming Computable Functions (PCF) is a typed functional language introduced by Gordon
Plotkin in 1977. The syntax of PCF is quite similar to that of F#; it is given by the following
context-free grammar:
e ::= x | n | true | false | succ | pred | iszero | if e then e else e | e e | fun x -> e | rec x -> e | (e)
2. Implementing a Lexical Analyzer (Scanner) for PCF in F#
A lexical analyzer will take an input stream of characters of PCF program and produce a
sequence of valid tokens.
(1) Defining the Tokens using F# Type Definition
type token =
| IDTOK of string | NUMTOK of int | TRUETOK | FALSETOK | SUCCTOK | PREDTOK
| ISZEROTOK | IFTOK | THENTOK | ELSETOK | FUNTOK | RECTOK | ARROWTOK
| LPARENTOK | RPARENTOK | EOF
which captures the above context-free grammar.
(2) Defining a F# Function to Break an Input Stream into A Character Sequence
let explode (s : string) = [for c in s -> c]
//Here an imperative control structure for loop is used for practical reason.
(3) Processing “White” Character
let isWhite c = c = ' ' || c = '\n' || c = '\r' || c = '\t'
(4) Handling Letters
let isAlpha c = ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
(5) Handling Digits
let isDigit c = '0' <= c && c <= '9'
(6) Dealing with Keywords
let keyword = function
| "true" -> TRUETOK
| "false" -> FALSETOK
| "succ" -> SUCCTOK
| "pred" -> PREDTOK
| "iszero" -> ISZEROTOK
| "if" -> IFTOK
| "then" -> THENTOK
| "else" -> ELSETOK
| "fun" -> FUNTOK
| "rec" -> RECTOK
| ident -> IDTOK ident
1
(7) Recognizing Identifier
let rec getid ident = function
| c :: cs when (isAlpha c || isDigit c) -> getid (ident + string c) cs
| cs -> (keyword ident, cs)
(8) Recognizing Numbers
let rec getnum num = function
| c :: cs when isDigit c -> getnum (10*num + int c - int '0') cs
| cs -> (NUMTOK num, cs)
(9) Obtaining Individual Token
let rec gettok = function
| [] -> (EOF, [])
| c :: cs when isWhite c -> gettok cs
| c :: cs when isAlpha c -> getid (string c) cs
| c :: cs when isDigit c -> getnum (int c - int '0') cs
| '(' :: cs -> (LPARENTOK, cs)
| ')' :: cs -> (RPARENTOK, cs)
| '-' :: '>' :: cs -> (ARROWTOK, cs)
| c :: cs -> (printf "Skipping illegal character: '%c'\n" c; gettok cs)
(10) Tokenizing the Input
let tokenize cs =
let rec gettoks toks cs =
match gettok cs with
| (EOF, cs) -> List.rev (EOF::toks)
| (tok, cs) -> gettoks (tok::toks) cs
gettoks [] cs
(11) Scanning
let scanner sourcecode = sourcecode |> explode |> tokenize