The German University Cairo April 3rd, 2012
Faculty of Media Engineering and Technology
Prof. Dr. Carmen Gervet
Bar Code
CSIS1003 – Compiler
Midterm exam
Instructions: Please Read Carefully Before Proceeding.
1. The allowed time for this exam is 2 hours (120 minutes).
2. (non-programmable) calculators are allowed.
3. No books or other aids are permitted for this test.
4. This exam booklet contains 11 pages, including this one. Three extra sheets of scratch paper are
attached and have to be kept attached. Note that if one or more pages are missing, you will lose
their points. Thus, you must check that your exam booklet is complete.
5. Please write your solutions in the space provided. If you need more space, please use the back of
the sheet containing the problem or on the three extra sheets and make an arrow indicating that.
6. When you are told that time is up, please stop working on the test.
All the best.
Please, do not write anything on this page.
Exercise 1 2 3 4 5 6 7 Total
Maximum Marks 4 4 4 4 5 4 4 29
Earned Marks
1|11
Exercise 1 Regular grammars (4 marks=2+1+1)
1.1 Give a regular expression for the language of positive even numbers. The only number that may
start with a zero is 0 itself.
An answer:
0|2|4|6|8|[1-9][0-9]*(0|2|4|6|8)
Grading scheme:
2 if as above
1.25 if reg. grammar given instead of expression but correct
2.1 Give the simplest regular expression for the language recognized by the following DFA
a b b
S0 S1 S2 S3
a
a
a
Model answer: (a|b)*abb 1 mark
Grading: 1 if as above,
0.5 if not simple but correct;
0.25 if ends with bb,
0 if does not end with abb
1.3 Draw an equivalent minimum state NFA
a,b
a b b
S0 S1 S2 S3
Grading: 1 mark,
0.4 if NFA only generates abb without the rest
2|11
Exercise 2 Finite automata (4 marks= 2+2)
Consider the alphabet = { A, B, C}. To each letter we associate a number:
A is worth 5 points
B is worth 10 points
C is worth 20 points
The value of a word is then the sum of the values of the letters. For instance the word AABAC is
worth 45 points.
Let K = {m, such that m is worth 20 points}.
2.1 Draw a DFA that recognizes K.
C S3
S0
B S2 S4 S5
A 4 A
A
A S6 S7 S8
B 4 A 8
B S9 S10 S11
A A 4 B
S12 S13
B
S14 S15 S16 S17
A 4 4 A 16
Marking: 2 full grade; -0.5 marks per forgotten accepting path
2.2 Draw an NFA that recognizes multiples of strings in K.
Same with epsilon to leave and loop. Accept both with and without the 0.
Marking : 2 marks; -1 the 0 is considered but there is no way to loop
3|11
Exercise 3 Lexical Analysis (4 marks)
Consider the following translation rules
\n { printf(“1”)} /*denotes the newline character */
A { printf(“2”)}
a { printf(“3”)}
d { printf(“4”)}
[b-d]* { printf(“5”)}
abc { printf(“6”)}
[A-C]* { printf{(“7”)}
[a-c]* { printf(“8”)}
[^a-c]+d[^A-C]+ { printf(“9”)}
Recall that ^ means “complement of the set of characters in [ ]”, as usual. The alphabet at hand
is = {A-Z, a-z, \n}
What will be the output on the following input? Show your reasoning by providing the tokenization
you obtained for the input, scanned all at once:
ABdac
Aacd
BCdbA
abccdba
ABdac\n|A|ac|d \n BCdb|A|\n|abcc|db|a
9 2 8 9 2 1 8 5 3 full mark
Also accepted, input read as:
ABdac\n\n|A|ac|d \n\n BCdb|A|\n|\n|abcc|db|a
9 2 8 9 2 1 1 8 5 3 full mark
Grading:
The goal was to apply the two rules for tokenization: maximum munch rule and first rule if token
recognized by 2 rules. Solutions accepted with full grade if as above ( 1 \n or 2 \n considered).
-1 mark: if both rules applied correctly, but „\n‟ not treated as part of alphabet thus not used in the
complement operator
-1 mark: if longest munch rule not applied (usually ending in last part as 4 8)
- 1 mark: if first “applicable rule” for given token not used
- 1 mark : if input not scanned all at once. And each line handled alone (no use of \n) BUT rules applied
properly
4|11
Exercise 4 Lexer – translation rules (4=1+1.5 +1.5 marks)
Consider the following specification:
c { printf(“1”);}
ac+b* { printf(“2”);}
cc { printf(“3”);}
ab* { printf(“4”);}
a { printf(“5”);}
ac*b+ { printf(“6”);}
Can the following outputs be produced? Justify your answers each time.
4.1 cabcccbba
Model answer: this string does not correspond to an output. Answer is NO
Grading: 1 mark
4.2 12345
Model answer: 5 can not be used since 4 would apply first to recognize any „a‟ alone.
So the answer is NO for this reason! If wrong reason given it is a 0.
Grading: 1.5 mark
4.3 64222
Answer: NO. 6 can not be used since 4 or 2 would apply first depending on whether „c‟ is there or not in
the input. If wrong reason given it is a 0.
Grading: 1.5 mark
5|11
Exercise 5 Grammar transformations (5 marks=1+2+2)
Consider the following grammar over the alphabet {1,2,3}:
S S ^ S | S / S | (S) | 1 | 2 | 3
^ denotes the exponent
5.1 Is this grammar ambiguous? Justify your answer.
Answer: Yes because a string such as 1^2 /2 has two parse tree derivations.
S S
S / S S ^ S
S ^S / 2 1 ^ S /S
1 ^ 2 /2 1 ^ 2 / 2
Leading to value 1/2 Leading to value 1
Grading: 1 mark
5.2 Assuming further that ^ is to be right associative and / is to be non-associative (i.e. 2/2/2 is
forbidden but (2/2)/2 and 2/(2/2) are allowed), re-write the grammar to reflect this. ^ has
precedence over /, You should use a maximum of 3 non-terminals in the grammar.
Model answer:
S T/T | T
T F ^ T | F 2 marks
F (S) | 1| 2| 3 -0.5 if not right associative for ^, -0.5 if
no precedence or wrong way, -0.5 if non-associative missed
Common mistake: using parenthesis to define left, right and non-associativity. This does not scan
expressions like 1^1^1 because they do not have parenthesis.
5.3 We now want to extend the grammar in 5.2 with the multiplication operation *, such that * is
left-associative and has precedence over ^ and /. Show the resulting grammar.
Model answer: extend previous one with left associativity for * and precedence (as new start rule)
S T/T | T if extension correct wrt left-associativity and
T T’ ^ T | T’ Precedence too: 2 marks
T’ T’ * F Even if based on a wrong 5.2 gramma
F -> (S) | 1| 2| 3 -1 if not left-assoc, -1 if precedence omitted
6|11
Exercise 6 Top-down parsing (4 marks= 1.5+2.5)
Consider the following CFG grammar
S Xa|c
X X b| S i
Prof Monty claims that this grammar is suitable for top-down parsing. Do you agree with him?
Justify your answer in detail.
Model answer: No because it is left-recursive (directly and indirectly)
Marking: 1.5 marks
If necessary transform the grammar to make it suitable. Show your reasoning.
Removing left recursion indirect
S Xa|c 1 mark
X X b| X ai | ci
Removing direct recursion on X
S Xa|c 1.5 mark
X ci X’
X’ bX‟| ai X‟| -0.5 marks if epsilon forgotten.
If resulting grammar is correct but followed a different procedure full grade 2.5 marks
7|11
Exercise 7 (4 marks= 2+2)
Consider the following grammar over the alphabet a,b,c:
S Xa
X bX|Y
Y Zc
Z bZ |
7.1 Prove that this grammar is ambiguous. Use parse trees.
Answer: bbca has 2 parse trees. 1 mark per parse tree = 2 marks
7.2 Is it possible to drop exactly one production from this grammar to obtain a new grammar
generating the same language and removes the ambiguity? If so identify this production.
** a disjunction corresponds to 2 productions!!
Answer: YES it is by removing either X -> bX or Z -> bZ 2 marks
Marking : if 2 productions removed but resulting grammar produces same language -1 mark
8|11