Implementaion of SQL Queries Using Lex and Yaac
Implementaion of SQL Queries Using Lex and Yaac
Submitted by:
Rishabh Jaiswal 181112001
YACC and LEX are two powerful Unix tools, that are
largely ignored by all but compiler writers, indeed, while
considerable time and effort is being devoted to software
reuse, little immediate interest has yet been raised on the
part of the ordinary Unix user.This report demonstrates
how a subset of the SQL query language has been
implemented as an interface to a relational database
system (PRECI/C in this case) using YACC and LEX.
Yacc (Yet Another Compiler-Compiler) is a computer
program for the Unix operating system developed
by Stephen C. Johnson. It is a Look Ahead Left-to-Right
(LALR) parser generator, generating a LALR parser (the
part of a compiler that tries to make syntactic sense of
the source code) based on an formal grammar, written in
a notation similar to Backus–Naur Form (BNF). Yacc is
supplied as a standard utility on BSD and AT&T Unix.
Lex is a computer program that generates lexical
analyzers ("scanners" or "lexers"). Lex is commonly used
with the yacc parser generator. Lex, originally written by
Mike Lesk and Eric Schmidt and described in 1975, is the
standard lexical analyzer generator on many Unix
systems, and an equivalent tool is specified as part of the
POSIX standard. Lex reads an input stream specifying the
lexical analyzer and outputs source code implementing
the lexer in the C programming language.
i
vii
TABLE OF CONTENTS
2. Flowchart..........................................................................................................5
3. Methodology…………………………………………………………………………6
I. Parsing SQL
II. LEX
III. YACC
4. Code.......................................................…………………………………………12
I. Lex Code
5. Conclusion……………………………………………………………………….....31
6. References…………………………………………………………………………..32
i
vii
Explanation of the Project
Statement
where the column (col1, col2,…) and table (tab1, tab2,..) names
have the usual structure of identifiers (alphanumeric strings
beginning with a letter) and cond is the condition that
is written by combining terms
col op col
col op const
vii i
FLOWCHART
vii i
METHODOLOGY
I. Parsing SQL
This parser is based on the version of SQL used in the popular MySQL open
source database. MySQL actually uses a bison parser to parse its SQL input,
although for a variety of reasons this parser isn’t based on mySQL’s parser but
rather is based on the description of the language in the manual.
MySQL’s parser is much longer and more complex, since this pedagogical
example leaves out many of the less heavily used parts. MySQL’s parser is
written in an odd way that uses bison to generate a C parser that’s compiled by
the C++ compiler, with a handwritten C++ lexer. There’s also the detail that its
license doesn’t allow excerpting in a book like this one. The ultimate definitions
for SQL are the standards documents published by ANSI and ISO including
ISO/IEC, which defines SQL, and a variety of related documents that define the
way to embed SQL in other programming languages and in XML.
First we need a lexer for the tokens that SQL uses. The syntax is free-format,
with whitespace ignored except to separate words. There is a fairly long but fixed
set of reserved words. The other tokens are conventional: names, strings,
numbers, and punctuation. Comments are Ada-style, from a pair of dashes to
the end of the line, with a MySQL extension also allowing C comments.
vii i
❖ MySQL Lexer
/*
* Scanner for mysql subset
* $Header: /usr/home/johnl/flnb/RCS/ch04.tr,v 1.7 2009/05/19 18:28:27 johnl
Exp $
*/
int oldstate;
%}
%x COMMENT
%s BTWMODE
%%
vii i
II. Lex
The structure of a Lex file is intentionally similar to that of a yacc file; files are
divided into three sections, separated by lines that contain only two percent
signs, as follows:
• The definition section defines macros and imports header files written
in C. It is also possible to write any C code here, which will be copied
verbatim into the generated source file.
• The rules section associates regular expression patterns with
C statements. When the lexer sees text in the input matching a given
pattern, it will execute the associated C code.
• The C code section contains C statements and functions that are copied
verbatim to the generated source file. These statements presumably
contain code called by the rules in the rules section. In large programs it
is more convenient to place this code in a separate file linked in
at compile time.
Lex is commonly used with the yacc parser generator. Lex, originally written
by Mike Lesk and Eric Schmidt and described in 1975, is the standard lexical
analyzer generator on many Unix systems, and an equivalent tool is specified
as part of the POSIX standard.
Lex reads an input stream specifying the lexical analyzer and outputs source
code implementing the lexer in the C programming language. In addition to C,
some old versions of Lex could also generate a lexer in Ratfor.
vii i
❖ Example of LEX File(.l)
%{
/* C code to be copied verbatim */
#include <stdio.h>
%}
%%
/*** Rules section ***/
%%
/*** C Code section ***/
int main(void)
{
/* Call the lexer, then quit. */
yylex();
return 0;
}
vii i
III. Yacc
recognizes summation expressions and constructs nodes for them. The special
identifiers $$, $1 and $3 refer to items on the parser's stack.
Yacc produces only a parser (phrase analyzer); for full syntactic analysis this
requires an external lexical analyzer to perform the first tokenization stage (word
analysis), which is then followed by the parsing stage proper. Lexical analyzer
generators, such as Lex or Flex, are widely available.
Yacc and similar programs (largely reimplementations) have been very popular.
Yacc itself used to be available as the default parser generator on most Unix
systems, though it has since been supplanted by more recent, largely
compatible, programs such as Berkeley Yacc, GNU Bison, MKS Yacc, and
Abraxas PCYACC. An updated version of the original AT&T version is included
as part of Sun's OpenSolaris project. Each offers slight improvements and
additional features over the original Yacc, but the concept and basic syntax have
remained the same.
Among the languages that were first implemented with Yacc are AWK, eqn and
Pic.[14] Yacc was also used on Unix to implement the Portable C Compiler, as
well as parsers for such programming languages as FORTRAN 77, Ratfor, APL,
bc, m4, etc.
Yacc has also been rewritten for other languages, including OCaml, Ratfor, ML,
Ada, Pascal, Java, Python, Ruby, Go, Common Lisp and Erlang.
vii i
❖ Example of YACC File(.y)
%{
#include <ctype.h>
#include <stdio.h>
%} %%
| S '\n’
yyerrok; };
S : '(' S ')’
| '[' S ']’
| /* empty */ ; %%
#include "lex.yy.c"
void yyerror(char * s)
int main(void)
{ return yyparse();
vii i
Code
❖ LEX Code(.l)
alpha [A-Za-z]
digit [0-9]
%%
or return OR;
vii i
"=" return '=';
on return ON;
%%
vii i
❖ Yacc Code(.y)
%{
#include <stdio.h>
#include <stdlib.h>
int yylex();
void yyerror(const char *s);
struct Node
{
struct Node* next;
struct Node *cousin;
char s[150];
};
%start S
%token <node> ID NUM SELECT DISTINCT FROM WHERE LE GE EQ NE OR
AND LIKE GROUP HAVING ORDER ASC DESC NOT LIMIT MINI MAXI
COUNT AVG SUM BETWEEN INNERJOIN LEFTJOIN RIGHTJOIN
FULLOUTERJOIN ON EXISTS ALL ANY TABLE DROP CREATE UPDATE SET
INSERT INTO VALUES DELETE OPERATOR '=' '>' '<' ')' '(' ',' '*'
%type <node> S select ST2 ST3 ST4 ST5 ST6 ST7 ST8 ST9 op drop create
insert delete update schema vallist columns tables attributes otheroptions join
COND E F options G
%union{
struct Node* node;
}
%%
S : select';'
{
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
exit(0);
};
| drop';'
{
vii i
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
exit(0);
};
| create';'
{
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
;exit(0);
};
| delete';'
{
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
exit(0);
};
| insert';'
{
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
exit(0);
};
| update ';'
{
$$=createNode("S");
$$->next=$1;
printAST($$,0);
printf("---Success---\n");
exit(0);
};
;
options : DISTINCT {
$$=createNode("options");
$1=createNode("DISTINCT");
$$->next=$1;
}
| otheroptions '(' ID ')' {
$$=createNode("options");
$2=createNode("(");
$3=createNode("ID");
$4=createNode(")");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
}
| {
$$=createNode("options");
}
;
ST2 : WHERE COND ST3 {
$$=createNode("ST2");$1=createNode("WHERE");$$->next=$1;$1-
>cousin=$2;
$2->cousin=$3;}
| WHERE ID op '('select')'{
$$=createNode("ST2");
$1=createNode("WHERE");
$2=createNode("ID");
$4=createNode("(");
$6=createNode(")");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
$5->cousin=$6;
}
| ST3
{
$$=createNode("ST2");$$-
vii i
>next=$1;
}
;
$1=createNode("OPERATOR");
$2=createNode("ALL");
$$->next=$1;
$1->cousin=$2;
}
| OPERATOR ANY {
$$=createNode("op");
$1=createNode("OPERATOR");
$2=createNode("ANY");
$$->next=$1;
$1->cousin=$2;
}
;
ST4 : GROUP attributes ST5 {
$$=createNode("ST4");
$1=createNode("GROUP");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
;
vii i
| ST5 {
$$=createNode("ST4");
$$->next=$1;
}
;
ST5 : HAVING COND ST6 {
$$=createNode("ST5");
$1=createNode("HAVING");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| ST6 {
$$=createNode("ST5");
$$->next=$1;
}
;
ST6 : ORDER ST7 {
$$=createNode("ST6");
$1=createNode("ORDER");
$$->next=$1;
$1->cousin=$2;
}
| {
$$=createNode("ST6---none");
}
;
ST7 : ID ST8',' ST7 {
$$=createNode("ST7");
$1=createNode("ID");
$3=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
}
|ID ST8 {
$$=createNode("ST7");
$1=createNode("ID");
$$->next=$1;
$1->cousin=$2;
}
;
ST8 : ASC {
$$=createNode("ST8");
$1=createNode("ASC");
$$->next=$1;
vii i
}
|DESC {
$$=createNode("ST8");
$1=createNode("DESC");
$$->next=$1;
}
| { $$=createNode("ST8---none");}
;
drop : DROP TABLE tables {
$$=createNode("drop");
$1=createNode("DROP");
$2=createNode("TABLE");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
;
create : CREATE TABLE ID'('columns')' {
$$=createNode("create");
$1=createNode("CREATE");
$2=createNode("TABLE");
$3=createNode("ID");
$4=createNode("(");
$6=createNode(")");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
$5->cousin=$6;
}
;
delete : DELETE FROM ID ST2 {
$$=createNode("delete");
$1=createNode("DELETE");
$2=createNode("FROM");
$3=createNode("ID");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
}
;
insert : INSERT INTO ID schema VALUES '('vallist')' {
$$=createNode("insert");
$1=createNode("INSERT");
$2=createNode("INTO");
vii i
$3=createNode("ID");
$5=createNode("VALUES");
$6=createNode("(");
$8=createNode(")");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
$5->cousin=$6;
$6->cousin=$7;
$7->cousin=$8;
}
;
schema: '('vallist')' {
$$=createNode("schema");
$1=createNode("(");
$3=createNode(")");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| { $$=createNode("schema");}
;
vallist : F','vallist {
$$=createNode("vallist");
$2=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
|F {$$=createNode("vallist");$$->next=$1;}
;
columns : ID ID','columns {
$$=createNode("columns");
$1=createNode("ID");
$2=createNode("ID");
$3=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
}
|ID ID {
$$=createNode("columns");
vii i
$1=createNode("ID");
$2=createNode("ID");
$$->next=$1;
$1->cousin=$2;
}
;
attributes : ID','attributes {
$$=createNode("attributelist");
$1=createNode("ID");
$2=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| '*' {$$=createNode("attributelist");$1=createNode("*");
$$->next=$1;}
| ID {
$$=createNode("attributelist");
$1=createNode("ID");
$$->next=$1;
}
;
otheroptions : MINI {
$$=createNode("otheroptions");
$1=createNode("MINI");
$$->next=$1;
}
| MAXI {
$$=createNode("otheroptions");
$1=createNode("MAXI");
$$->next=$1;
}
| DISTINCT {
$$=createNode("otheroptions");
$1=createNode("DISTINCT");
$$->next=$1;
}
| COUNT {
$$=createNode("otheroptions");
$1=createNode("COUNT");
$$->next=$1;
}
vii i
| AVG {
$$=createNode("otheroptions");
$1=createNode("AVG");
$$->next=$1;
}
| SUM {
$$=createNode("otheroptions");
$1=createNode("SUM");
$$->next=$1;
}
;
tables : ID',' tables {
$$=createNode("tablelist");
$1=createNode("ID");
$2=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| ID {
$$=createNode("tablelist");
$1=createNode("ID");
$$->next=$1;
}
| ID join ID {
$$=createNode("tablelist");
$1=createNode("ID");
$3=createNode("ID");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| ID join ID ON E {
$$=createNode("tablelist");
$1=createNode("ID");
$3=createNode("ID");
$4=createNode("ON");
$$->next=$1;
$1->cousin=$2;
vii i
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
}
;
join : INNERJOIN {
$$=createNode("join");
$1=createNode("INNERJOIN");
$$->next=$1;
}
| LEFTJOIN {
$$=createNode("join");
$1=createNode("LEFTJOIN");
$$->next=$1;
}
| RIGHTJOIN {
$$=createNode("join");
$1=createNode("RIGHTJOIN");
$$->next=$1;
}
| FULLOUTERJOIN {
$$=createNode("join");
$1=createNode("FULLOUTERJOIN");
$$->next=$1;
}
COND : COND OR COND {
$$=createNode("COND");
$2=createNode("OR");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| COND AND COND {
$$=createNode("COND");
$2=createNode("AND");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
vii i
}
| NOT COND {
$$=createNode("COND");
$1=createNode("NOT");
$$->next=$1;
$1->cousin=$2;
}
|E {$$=createNode("COND"); $$->next=$1;}
;
E : F '=' F {
$$=createNode("E");
$2=createNode("==");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F '<' F {
$$=createNode("E");
$2=createNode("<");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F '>' F {
$$=createNode("E");
$2=createNode(">");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F LE F {
$$=createNode("E");
$2=createNode("LE");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F GE F {
$$=createNode("E");
$2=createNode("GE");
$$->next=$1;
$1->cousin=$2;
vii i
$2->cousin=$3;
}
| F EQ F {
$$=createNode("E");
$2=createNode("EQ");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F NE F {
$$=createNode("E");
$2=createNode("NE");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F OR F {
$$=createNode("E");
$2=createNode("OR");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F AND F {
$$=createNode("E");
$2=createNode("AND");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| F LIKE F {
$$=createNode("E");
$2=createNode("LIKE");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
}
| ID BETWEEN NUM AND NUM {
$$=createNode("E");
$1=createNode("ID");
$2=createNode("BETWEEN");
vii i
$3=createNode("NUM");
$4=createNode("AND");
$5=createNode("NUM");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
}
;
F : ID {
$$=createNode("F");
$1=createNode("ID");
$$->next=$1;
}
| NUM {
$$=createNode("F");
$1=createNode("NUM");
$$->next=$1;
}
;
update : UPDATE ID SET ST9 ST2
{
$$=createNode("update");
$1=createNode("UPDATE");
$2=createNode("ID");
$3=createNode("SET");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
};
ST9 : ID'='G ',' ST9
{
$$=createNode("ST9");
$1=createNode("ID");
$2=createNode("=");
$4=createNode(",");
$$->next=$1;
$1->cousin=$2;
$2->cousin=$3;
$3->cousin=$4;
$4->cousin=$5;
}
vii i
|ID'='G
{
$$=createNode("ST9");
$1=createNode("ID");
$2=createNode("=");
$$->next = $1;
$1->cousin=$2;
$2->cousin = $3;
};
G : ID
{
$$=createNode("G");
$1=createNode("ID");
$$->next=$1;
}
| NUM
{
$$=createNode("G");
$1=createNode("NUM");
$$->next=$1;
}
;
%%
#include"lex.yy.c"
vii i
vii i
CONCLUSION
vii i
REFERENCES
vii i