/ 34
Clang Tutorial
CS453 Automated Software Testing
Clang Tutorial, CS453 Automated Software Testing
1 / 34
Overview
Clang is a library to convert a C program into an abstract
syntax tree (AST) and manipulate the AST
Ex) finding branches, renaming variables, pointer alias analysis, etc
Example C code
2 functions are declared: myPrint and main
mainfunction calls myPrint and returns 0
myPrint function calls printf
myPrint contains if and for statements
1 global variable is declared: global
//Example.c
#include<stdio.h>
int global;
void myPrint(int param){
if (param ==1)
printf("param is1");
for (int i=0;i<10;i++){
global+=i;
}
}
int main(int argc,char *argv[]){
int param =1;
myPrint(param);
return 0;
}
Clang Tutorial, CS453 Automated Software Testing
2 / 34
Example AST
Clang generates 3 ASTs for myPrint(), main(),and global
A function declaration has a function body and parameters
ASTs for
myPrint()
AST for global
ASTs for
main()
Clang Tutorial, CS453 Automated Software Testing
Structure of AST
Each node in AST is an instance of either Decl or Stmt
class
Decl represents declarations and there are sub-classes of Decl for
different declaration types
Ex) FunctionDecl class for function declaration and ParmVarDecl class for
function parameter declaration
Stmt represents statements and there are sub-classes of Stmt for
different statement types
Ex) IfStmt for if and ReturnStmt class for function return
Comments (i.e., /* */, // ) are not built into an AST
3 / 34
Clang Tutorial, CS453 Automated Software Testing
4 / 34
Decl (1/4)
Function declaration
A root of the function AST is a Decl node
A root of function AST is an instance of FunctionDecl
which is a sub-class of Decl
14 int main(int argc,
15 char *argv[]){
16 int param =1;
17 myPrint(param);
18 return 0;}
Legend
Clang Tutorial, CS453 Automated Software Testing
5 / 34
Decl (2/4)
FunctionDecl can have an instance of ParmVarDecl for a
function parameter and a function body
ParmVarDecl is a child class of Decl
Function body is an instance of Stmt
In the example, the function body is an instance of CompoundStmt
which is a sub-class of Stmt
Function parameter
declarations
Function body
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Legend
Clang Tutorial, CS453 Automated Software Testing
6 / 34
Decl (3/4)
VarDecl is for a local and global variable declaration
VarDecl has a child if a variable has a initial value
In the example, VarDecl has IntegerLiteral
Global variable declaration
Initial value
Local variable declaration
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Legend
Clang Tutorial, CS453 Automated Software Testing
7 / 34
Decl (4/4)
FunctionDecl, ParmVarDecl and VarDecl have a name and
a type of declaration
Ex) FunctionDecl has a name main and a type void(int,char**)
Types
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Names
Legend
Types
Clang Tutorial, CS453 Automated Software Testing
8 / 34
Stmt (1/9)
Stmt represents a statements
SubclassesofStmt
CompoundStmt class for code block
DeclStmt class for local variable declaration
ReturnStmt class for function return
Statements
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Legend
Clang Tutorial, CS453 Automated Software Testing
9 / 34
Stmt (2/9)
Expr represents an expression (a subclass of Stmt)
Subclasses of Expr
CallExpr for function call
ImplicitCastExpr for implicit type casts
DeclRefExpr for referencing declared variables and functions
IntegerLiteral for integer literals
Expressions
(also statements)
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Legend
Clang Tutorial, CS453 Automated Software Testing
10 / 34
Stmt (3/9)
Stmt may have a child containing additional
information
CompoundStmt has statements in a code block of
braces ({})
int param =1;
myPrint(param);
14 int main(int argc,char *argv[]){
15 int param =1;
16 myPrint(param);
17 return 0;
18 }
Legend
return 0;
Clang Tutorial, CS453 Automated Software Testing
11 / 34
Stmt (4/9)
Stmt may have a child containing additional
information (cont)
The first child of CallExpr is for a function pointer and the
others are for function parameters
Legend
Declarations for DeclStmt
Function pointer for
CallExpr
Function parameter for
CallExpr
Return value for ReturnStmt
Clang Tutorial, CS453 Automated Software Testing
12 / 34
Stmt (5/9)
Expr has a type of an expression
Ex) a node of CallExpr has a type void
Some sub-classes of Expr can have a value
Ex) a node of IntegerLiteral has a value 1
Legend
Values
Types
Value
Types
Clang Tutorial, CS453 Automated Software Testing
13 / 34
Stmt (6/9)
FunctionDecl
myPrint'void(int)'
myPrint function contains IfStmt
ParmVarDecl
param'int'
CompoundStmt
IfStmt
Null
BinaryOperator
'==' 'int'
and ForStmt in its function body
IntegerLiteral
1 'int'
ImplicitCastExpr
'int'
ForStmt
DeclStmt
CallExpr'int'
VarDecl
i 'int'
Null
BinaryOperator
'<' 'int'
ImplicitCastExpr
'int(*)()'
IntegerLiteral
0'int'
ImplicitCastExpr
'char*'
IntegerLiteral
10'int'
ImplicitCastExpr
'int'
UnaryOperator
'++' 'int'
DeclRefExpr
'param''int'
DeclRefExpr
'printf''int()'
StringLiteral
"paramis1" 'char[11]'
Null
DeclRefExpr
'i''int'
DeclRefExpr
'i''int'
CompoundStmt
CompoundAssignOperator
'+=' 'int'
DeclRefExpr
'global''int'
ImplicitCastExpr
'int'
DeclRefExpr
'i''int'
6 void myPrint(int param){
if (param ==1)
7
printf("param is1");
8
for (int i=0;i<10;i++){
9
global+=i;
10
}
11
12 }
Clang Tutorial, CS453 Automated Software Testing
14 / 34
Stmt (7/9)
Condition variable
Condition
IfStmt has 4 children
A condition variable in VarDecl
In C++, you can declare a variable
in condition (not in C)
A condition in Expr
Then block in Stmt
Else block in Stmt
6 void myPrint(int param){
if (param ==1)
7
printf("param is1");
8
for (int i=0;i<10;i++){
9
global+=i;
10
}
11
12 }
Then block
Else block
Clang Tutorial, CS453 Automated Software Testing
15 / 34
Stmt (8/9)
Initialization
Condition variable
Condition
ForStmt has 5 children
Initialization in Stmt
A condition variable in VarDecl
A condition in Expr
Increment in Expr
A loop block in Stmt
Increment
6 void myPrint(int param){
if (param ==1)
7
printf("param is1");
8
for
(int i=0;i<10;i++){
9
global+=i;
10
}
11
12 }
Loop block
Clang Tutorial, CS453 Automated Software Testing
16 / 34
Stmt (9/9)
BinaryOperator has 2 children for
operands
UnaryOperator has a child for
operand
Two operands for BinaryOperator
A operand for UnaryOperator
6 void myPrint(int param){
if (param ==1)
7
printf("param is1");
8
for
(int i=0;i<10;i++){
9
global+=i;
10
}
11
12 }
Clang Tutorial, CS453 Automated Software Testing
17 / 34
Traversing Clang AST (1/3)
ParseAST() starts building and traversal of an AST
ThecallbackfunctionHandleTopLevelDecl() in ASTConsumer iscalledfor
eachtopleveldeclaration
HandleTopLevelDecl() receives a list of function and global variable
declarations as a parameter
void clang::ParseAST (Preprocessor&pp,ASTConsumer *C,ASTContext &Ctx,)
A user has to customize ASTConsumer
1 class MyASTConsumer :public ASTConsumer
2 {
public:
3
MyASTConsumer(Rewriter&R){}
4
5
virtualbool HandleTopLevelDecl(DeclGroupRef DR) {
6
for(DeclGroupRef::iteratorb=DR.begin(),e=DR.end();b!=e;++b){
7
//variablebhaseachdecleration inDR
8
}
9
return true;
10
}
11
12 };
Clang Tutorial, CS453 Automated Software Testing
18 / 34
Traversing Clang AST (2/3)
HandleTopLevelDecl()callsTraverseDecl()whichrecursivelytravela
targetASTfromthetopleveldeclarationbycallingVisitStmt (),
VisitFunctionDecl(),etc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
classMyASTVisitor :public RecursiveASTVisitor<MyASTVisitor>{
bool VisitStmt(Stmt *s){
VisitStmt is called when Stmt is encountered
printf("\t%s \n",s>getStmtClassName());
returntrue;
}
bool VisitFunctionDecl(FunctionDecl *f){
VisitFunctionDecl is called when
if(f>hasBody()){
FunctionDecl is encountered
Stmt *FuncBody =f>getBody();
printf("%s\n",f>getName());
}
returntrue;
}
};
class MyASTConsumer :public ASTConsumer {
virtualbool HandleTopLevelDecl(DeclGroupRef DR){
for (DeclGroupRef::iteratorb=DR.begin(),e=DR.end();b!=e;++b){
MyASTVisitor Visitor;
Visitor.TraverseDecl(*b);
}
return true;
}
};
Clang Tutorial, CS453 Automated Software Testing
19 / 34
Traversing Clang AST (3/3)
VisitStmt() in RecursiveASTVisitor is called for every Stmt object in
the AST
RecursiveASTVisitor visits each Stmt in a depth-first search order
If the return value of VisitStmt is false, recursive traversal halts
Example: main function of the previous example
FunctionDecl
main 'void(int,char**)'
RecursiveASTVisitor will visit all nodes in
this box (the numbers are the order of
traversal)
ParmVarDecl
argc 'int'
ParmVarDecl
argv 'char**':'char**'
CompoundStmt
DeclStmt
VarDecl
param 'int'
IntegerLiteral
'int'1
CallExpr'void'
ImplicitCastExpr
'void (*)()'
DeclRefExpr
'myPrint''void ()'
ImplicitCastExpr
'int'
ReturnStmt
DeclRefExpr
'param''int'
10
IntegerLiteral
'int'0
11
Clang Tutorial, CS453 Automated Software Testing
Guideline for HW #2
Initialization of Clang
Line number information of Stmt
Useful Functions
20 / 34
Clang Tutorial, CS453 Automated Software Testing
21 / 34
Initialization of Clang
Initialization of Clang is complicated
To use Clang, many classes should be created and many functions
should be called to initialize Clang environment
Ex) ComplierInstance,TargetOptions,FileManager, etc.
It is recommended to use the initialization part of the
sample source code from the course homepage as is, and
implement your own ASTConsumer and
RecursiveASTVisitor classes
Clang Tutorial, CS453 Automated Software Testing
22 / 34
Line number information of Stmt
A SourceLocation object from getLocStart() of Stmt
has a line information
SourceManager is used to get line and column information from
SourceLocation
In the initialization step, SourceManager object is created
getExpansionLineNumber() and getExpansionColumnNumber()in
SourceManager give line and column information, respectively
bool VisitStmt(Stmt *s){
SourceLocation startLocation =s>getLocStart();
SourceManager &srcmgr=m_srcmgr;//youcangetSourceManager fromtheinitializationpart
unsignedint lineNum =srcmgr.getExpansionLineNumber(startLocation);
unsignedint colNum =srcmgr.getExpansionColumnNumber(startLocation);
Clang Tutorial, CS453 Automated Software Testing
23 / 34
Useful Functions
dump() and dumpColor() in Stmt and FunctionDecl to
print AST
dump() shows AST rooted at Stmt or FunctionDecl object
dumpColor() is similar to dump() but shows AST with syntax
highlight
Example: dumpColor() of myPrint
FunctionDecl 0x368a1e0<line:6:1>myPrint 'void(int)'
|ParmVarDecl 0x368a120<line:3:14,col:18>param 'int'
`CompoundStmt 0x36a1828<col:25,line:6:1>
`IfStmt 0x36a17f8<line:4:3,line:5:24>
|<<<NULL>>>
|BinaryOperator 0x368a2e8<line:4:7,col:16>'int' '=='
||ImplicitCastExpr 0x368a2d0<col:7>'int' <LValueToRValue>
||`DeclRefExpr 0x368a288<col:7>'int' lvalue ParmVar 0x368a120'param' 'int'
|`IntegerLiteral 0x368a2b0<col:16>'int' 1
|CallExpr 0x368a4e0<line:5:5,col:24>'int'
||ImplicitCastExpr 0x368a4c8<col:5>'int (*)()' <FunctionToPointerDecay>
||`DeclRefExpr 0x368a400<col:5>'int ()'Function0x368a360'printf' 'int ()'
|`ImplicitCastExpr 0x36a17e0<col:12>'char*'<ArrayToPointerDecay>
|`StringLiteral 0x368a468<col:12>'char[11]'lvalue "param is1"
`<<<NULL>>>
Clang Tutorial, CS453 Automated Software Testing
Guideline for HW #3
Code modification using Rewriter
Converting Stmt into String
Obtaining SourceLocation
24 / 34
Clang Tutorial, CS453 Automated Software Testing
25 / 34
Code Modification using Rewriter
You can modify code using Rewriter class
Rewriter has functions to insert, remove and replace code
InsertTextAfter(loc,str),InsertTextBefore(loc,str),RemoveText(loc,size),
ReplaceText(), etc. where loc, str, size are a location (SourceLocation), a
string, and a size of statement to remove, respectively
Example: inserting a text before a condition in IfStmt using
InsertTextAfter()
1 bool MyASTVisitor::VisitStmt(Stmt *s){
2
if(isa<IfStmt>(s)){
3
IfStmt *ifStmt =cast<IfStmt>(s);
4
condition=ifStmt>getCond();
5
m_rewriter.InsertTextAfter(condition>getLocStart(),"/*startofcond*/");
6
}
7 }
if(param ==1)
if(/*startofcond*/param ==1)
Clang Tutorial, CS453 Automated Software Testing
26 / 34
Output of Rewriter
Modified code is obtained from a RewriterBuffer of Rewriter
through getRewriteBufferFor()
Example code which writes modified code in output.txt
ParseAST()modifies a target code as explained in the previous slides
TheConsumer contains a Rewriter instance TheRewriter
1 int main(int argc,char *argv[]){
2
3
ParseAST(TheCompInst.getPreprocessor(),&TheConsumer,TheCompInst.getASTContext());
4
const RewriteBuffer *RewriteBuf =TheRewriter.getRewriteBufferFor(SourceMgr.getMainFileID());
5
ofstream output(output.txt);
6
output<<string(RewriteBuf>begin(),RewriteBuf>end());
7
output.close();
8 }
Clang Tutorial, CS453 Automated Software Testing
27 / 34
Converting Stmt into String
ConvertToString(stmt) of Rewriter returns a string
corresponding to Stmt
The returned string may not be exactly same to the original statement
since ConvertToString() prints a string using the Clang pretty printer
For example, ConvertToString() will insert a space between an operand and
an operator
BinaryOperator
'<' 'int'
a<100
ParstAST
ImplicitCastExpr
'int'
IntegerLiteral
100'int'
ConvertToString
DeclRefExpr
'a''int'
a<100
Clang Tutorial, CS453 Automated Software Testing
28 / 34
SourceLocation
To change code, you need to specify where to change
Rewriter class requires a SourceLocation class instance which
contains location information
You can get a SourceLocation instance by:
getLocStart() and getLocEnd()of Stmt which return a start and an end
locations of Stmt instance respectively
findLocationAfterToken(loc, tok, ) of Lexer which returns the
location of the first token tok occurring right after loc
Lexer tokenizes a target code
SourceLocation.getLocWithOffset(offset,) which returns location
adjusted by the given offset
Clang Tutorial, CS453 Automated Software Testing
29 / 34
getLocStart() and getLocEnd()
getLocStart() returns the exact starting location of Stmt
getLocEnd() returns the location of Stmt that corresponds to the
last-1 th tokens ending location of Stmt
To get correct end location, you need to use Lexer class in addition
Example: getLocStart() and getLocEnd() results of IfStmt condition
getLocStart()pointsto
The last token of IfStmt condition
if (param ==1)
getLocEnd()pointstothe end of == not 1
Clang Tutorial, CS453 Automated Software Testing
30 / 34
findLocationAfterToken (1/2)
Static function findLocationAfterToken(loc,Tkind,) of Lexer
returns the ending location of the first token of Tkind type after loc
static SourceLocation findLocationAfterToken (SourceLocation loc,tok::TokenKind TKind,const
SourceManager &SM,const LangOptions &LangOpts,bool SkipTrailingWhitespaceAndNewLine)
Use findLocationAfterToken to get a correct end location of Stmt
Example: finding a location of ) (tok::r_paren) using
findLocationAfterToken()to find the end of if condition
1 bool MyASTVisitor::VisitStmt(Stmt *s){
2
if(isa<IfStmt>(s)){
3
IfStmt *ifStmt =cast<IfStmt>(s);
4
condition=ifStmt>getCond();
5
SourceLocation endOfCond =clang::Lexer::findLocationAfterToken(condition>
getLocEnd(),tok::r_paren,m_sourceManager,m_langOptions,false);
6
//endOfCond points)
findLocationAfterToken
7
}
ifStmt>getCond()>getLocEnd()
(,tok::r_paran)
8 }
if ( a + x > 3 )
Clang Tutorial, CS453 Automated Software Testing
31 / 34
findLocationAfterToken (2/2)
You may find a location of other tokens by changing TKind
parameter
List of useful enums for HW #3
Enum name
Token character
tok::semi
tok::r_paren
tok::question
tok::r_brace
The fourth parameter LangOptions instance is obtained from
getLangOpts() of CompilerInstance (see line 99 and line 106 of
the appendix)
You can find CompilerInstance instance in the initialization part of
Clang
Clang Tutorial, CS453 Automated Software Testing
32 / 34
References
Clang, http://clang.llvm.org/
Clang API Documentation, http://clang.llvm.org/doxygen/
How to parse C programs with clang: A tutorial in 9 parts,
http://amnoid.de/tmp/clangtut/tut.html
Clang Tutorial, CS453 Automated Software Testing
Appendix: Example Source Code (1/4)
This program prints the name of declared functions and
the class name of each Stmt in function bodies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
PrintFunctions.c
#include<cstdio>
#include<string>
#include<iostream>
#include<sstream>
#include<map>
#include<utility>
#include"clang/AST/ASTConsumer.h"
#include"clang/AST/RecursiveASTVisitor.h"
#include"clang/Basic/Diagnostic.h"
#include"clang/Basic/FileManager.h"
#include"clang/Basic/SourceManager.h"
#include"clang/Basic/TargetOptions.h"
#include"clang/Basic/TargetInfo.h"
#include"clang/Frontend/CompilerInstance.h"
#include"clang/Lex/Preprocessor.h"
#include"clang/Parse/ParseAST.h"
#include"clang/Rewrite/Core/Rewriter.h"
#include"clang/Rewrite/Frontend/Rewriters.h"
#include"llvm/Support/Host.h"
#include"llvm/Support/raw_ostream.h"
usingnamespaceclang;
usingnamespacestd;
classMyASTVisitor :publicRecursiveASTVisitor<MyASTVisitor>
{
public:
33 / 34
Clang Tutorial, CS453 Automated Software Testing
Appendix: Example Source Code (2/4)
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
bool VisitStmt(Stmt *s){
//Printnameofsubclassofs
printf("\t%s \n",s>getStmtClassName());
returntrue;
}
bool VisitFunctionDecl(FunctionDecl *f){
//Printfunctionname
printf("%s\n",f>getName());
returntrue;
}
};
classMyASTConsumer :publicASTConsumer
{
public:
MyASTConsumer()
:Visitor()//initializeMyASTVisitor
{}
virtualbool HandleTopLevelDecl(DeclGroupRef DR){
for(DeclGroupRef::iteratorb=DR.begin(),e=DR.end();b!=e;++b){
//TraveleachfunctiondeclarationusingMyASTVisitor
Visitor.TraverseDecl(*b);
}
returntrue;
}
private:
MyASTVisitor Visitor;
};
int main(int argc,char*argv[])
{
34 / 34
Clang Tutorial, CS453 Automated Software Testing
Appendix: Example Source Code (3/4)
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
if(argc !=2){
llvm::errs()<<"Usage:PrintFunctions <filename>\n";
return1;
}
//CompilerInstance willholdtheinstanceoftheClangcompilerforus,
//managingthevariousobjectsneededtorunthecompiler.
CompilerInstance TheCompInst;
//Diagnosticsmanageproblemsandissuesincompile
TheCompInst.createDiagnostics(NULL,false);
//Settargetplatformoptions
//Initializetargetinfowiththedefaulttripleforourplatform.
TargetOptions *TO=newTargetOptions();
TO>Triple=llvm::sys::getDefaultTargetTriple();
TargetInfo *TI=TargetInfo::CreateTargetInfo(TheCompInst.getDiagnostics(),TO);
TheCompInst.setTarget(TI);
//FileManager supportsforfilesystemlookup,filesystemcaching,anddirectorysearchmanagement.
TheCompInst.createFileManager();
FileManager &FileMgr =TheCompInst.getFileManager();
//SourceManager handlesloadingandcachingofsourcefilesintomemory.
TheCompInst.createSourceManager(FileMgr);
SourceManager &SourceMgr =TheCompInst.getSourceManager();
//Prreprocessor runswithinasinglesourcefile
TheCompInst.createPreprocessor();
//ASTContext holdslonglivedASTnodes(suchastypesanddecls).
TheCompInst.createASTContext();
//ARewriterhelpsusmanagethecoderewritingtask.
RewriterTheRewriter;
35 / 34
Clang Tutorial, CS453 Automated Software Testing
36 / 34
Appendix: Example Source Code (4/4)
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115 }
TheRewriter.setSourceMgr(SourceMgr,TheCompInst.getLangOpts());
//Setthemainfilehandledbythesourcemanagertotheinputfile.
const FileEntry *FileIn =FileMgr.getFile(argv[1]);
SourceMgr.createMainFileID(FileIn);
//InformDiagnosticsthatprocessingofasourcefileisbeginning.
TheCompInst.getDiagnosticClient().BeginSourceFile(TheCompInst.getLangOpts(),&TheCompInst.getPreprocessor());
//CreateanASTconsumerinstancewhichisgoingtogetcalledbyParseAST.
MyASTConsumer TheConsumer;
//ParsethefiletoAST,registeringourconsumerastheASTconsumer.
ParseAST(TheCompInst.getPreprocessor(),&TheConsumer,TheCompInst.getASTContext());
return0;