{"id":64404,"date":"2017-03-09T16:00:14","date_gmt":"2017-03-09T14:00:14","guid":{"rendered":"https:\/\/www.javacodegeeks.com\/?p=64404"},"modified":"2017-03-09T14:23:42","modified_gmt":"2017-03-09T12:23:42","slug":"antlr-mega-tutorial","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html","title":{"rendered":"The ANTLR mega tutorial"},"content":{"rendered":"<p>Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages.<\/p>\n<p>In this complete tutorial we are going to:<\/p>\n<ul>\n<li><strong>explain the basis<\/strong>: what a parser is, what it can be used for<\/li>\n<li>see <strong>how to setup ANTLR<\/strong> to be used from Javascript, Python, Java and C#<\/li>\n<li>discuss <strong>how to test<\/strong> your parser<\/li>\n<li>present the most <strong>advanced and useful features<\/strong> present in ANTLR: you will learn all you need to parse all possible languages<\/li>\n<li>show <strong>tons of examples<\/strong><\/li>\n<\/ul>\n<p>Maybe you have read some tutorial that was too complicated or so partial that seemed to assume that you already know how to use a parser. This is not that kind of tutorial. We just expect you to know how to code and how to use a text editor or an IDE. That\u2019s it.<\/p>\n<p>At the end of this tutorial:<\/p>\n<ul>\n<li>you will be able to write a parser to recognize different formats and languages<\/li>\n<li>you will be able to create all the rules you need to build a lexer and a parser<\/li>\n<li>you will know how to deal with the common problems you will encounter<\/li>\n<li>you will understand errors and you will know how to avoid them by testing your grammar.<\/li>\n<\/ul>\n<p>In other words, we will start from the very beginning and when we reach the end you will have learned all you could possible need to learn about ANTLR.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second.png\"><img decoding=\"async\" class=\"aligncenter wp-image-64415\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second.png\" width=\"860\" height=\"752\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second.png 1030w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second-300x262.png 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second-768x672.png 768w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/ANTLR-Mega-Tutorial_Finale_Second-1024x896.png 1024w\" sizes=\"(max-width: 860px) 100vw, 860px\" \/><\/a><\/p>\n<p>ANTLR Mega Tutorial Giant List of Content<\/p>\n<h2>What is ANTLR?<\/h2>\n<p>ANTLR is a parser generator, a tool that helps you to\u00a0create parsers. <strong>A parser takes a piece of text and transform it in an organized structure<\/strong>, such as an Abstract Syntax Tree (AST). You can think of the AST as a story describing the content of the code or also as its logical representation created by putting together the various pieces.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/798px-Abstract_syntax_tree_for_Euclidean_algorithm.svg_.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64416\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/798px-Abstract_syntax_tree_for_Euclidean_algorithm.svg_.png\" alt=\"\" width=\"798\" height=\"900\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/798px-Abstract_syntax_tree_for_Euclidean_algorithm.svg_.png 798w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/798px-Abstract_syntax_tree_for_Euclidean_algorithm.svg_-266x300.png 266w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/798px-Abstract_syntax_tree_for_Euclidean_algorithm.svg_-768x866.png 768w\" sizes=\"(max-width: 798px) 100vw, 798px\" \/><\/a><\/p>\n<p>Graphical representation of an AST for the Euclidean algorithm<\/p>\n<p>What you need to do to get an AST:<\/p>\n<ol>\n<li>define a lexer and parser grammar<\/li>\n<li>invoke ANTLR: it will generate a lexer and a parser in your target language (e.g., Java, Python, C#, Javascript)<\/li>\n<li>use the generated lexer and parser: you invoke them passing the code to recognize and they return to you an AST<\/li>\n<\/ol>\n<p>So you need to start by defining a lexer and parser grammar\u00a0for the thing that you are analyzing. Usually the \u201cthing\u201d is a language, but it could also be a data format, a diagram, or any kind of structure that is represented with text.<\/p>\n<h2>Aren\u2019t regular expressions\u00a0enough?<\/h2>\n<p>If you are the typical programmer you may ask yourself <em>why can\u2019t I use a regular expression<\/em>? A regular expression is quite useful, such as when you want to find a number in a string of text, but it also has many limitations.<\/p>\n<p>The most obvious is the lack of recursion: you can\u2019t find a (regular) expression inside another one, unless you code it by hand for each level. Something that quickly became unmaintainable. But the larger problem is that it\u2019s not really scalable: if you are going to put together even just a few regular expressions, you are going to create a fragile mess that would be hard to maintain.<\/p>\n<blockquote>\n<p>It\u2019s not that easy to use regular expressions<\/p>\n<\/blockquote>\n<p>Have you ever tried parsing HTML with a regular expression? It\u2019s a terrible idea, for one you risk summoning <a href=\"https:\/\/stackoverflow.com\/questions\/1732348\/regex-match-open-tags-except-xhtml-self-contained-tags\/\">Cthulhu<\/a>, but more importantly <strong>it doesn\u2019t really work<\/strong>. You don\u2019t believe me?\u00a0Let\u2019s see, you want to find the elements of a table, so you try a regular exprdatession like this one: <code>&lt;table&gt;(.*?)&lt;\/table&gt;<\/code>. Brilliant! You did it! Except somebody adds attributes to\u00a0their table, such as <code>style<\/code> or <code>id<\/code>. It doesn\u2019t matter, you do this <code>&lt;table.*?&gt;(.*?)&lt;\/table&gt;<\/code>, but you actually cared about the data inside the table, so you then need to parse <code>tr<\/code> and <code>td<\/code>, but they are full of tags.<\/p>\n<p>So you need to eliminate that, too. And somebody dares even to use comments like &lt;!\u2014 my comment &amp;gtl\u2014&gt;. Comments can be used everywhere, and that is not easy to treat with your regular expression. Is it?<\/p>\n<p>So you forbid the internet to use comments in HTML: problem solved.<\/p>\n<p>Or alternatively you use ANTLR, whatever seems simpler to you.<\/p>\n<h2>ANTLR\u00a0vs writing your own parser by hand<\/h2>\n<p>Okay, you are convinced, you need a parser,\u00a0but why to use a parser generator like ANTLR instead of building your own?<\/p>\n<blockquote>\n<p>The main advantage of ANTLR is productivity<\/p>\n<\/blockquote>\n<p>If you actually have to work with a parser all the time, because your language, or format, is evolving,\u00a0you need to be able to keep the pace, something you can\u2019t do if you have to deal with the details of implementing a parser. Since you are not parsing for parsing\u2019s sake, you must have the chance to concentrate on accomplishing your goals. And ANTLR make it much easier to do that, rapidly and cleanly.<\/p>\n<p>As second thing, once you defined your grammars you can ask ANTLR to generate multiple parsers in different languages. For example you can get a parser in C# and one in Javascript to parse the same language in a desktop application and in a web application.<\/p>\n<p>Some people argue that writing a parser by hand you can make it faster and you can produce better error messages. There is some truth in this, but in my experience parsers generated by ANTLR are always fast enough. You can tweak them and improve both performance and error handling by working on your grammar, if you really need to. And you can do that once you are happy with your grammar.<\/p>\n<h2>Table of Contents or\u00a0<em>ok, I am convinced, show me what you got<\/em><\/h2>\n<p>Two small notes:<\/p>\n<ul>\n<li>in the <a href=\"https:\/\/github.com\/unosviluppatore\/antlr-mega-tutorial\">companion repository of this tutorial<\/a> you are going to find all the code with testing, even where we don\u2019t see it in the article<\/li>\n<li>the examples will be in different languages, but the knowledge would be generally applicable to any language<\/li>\n<\/ul>\n<h3>Setup<\/h3>\n<ol>\n<li><a href=\"#setup-antlr\">Setup ANTLR<\/a><\/li>\n<li><a href=\"#javascript-setup\">Javascript Setup<\/a><\/li>\n<li><a href=\"#python-setup\">Python Setup<\/a><\/li>\n<li><a href=\"#java-setup\">Java Setup<\/a><\/li>\n<li><a href=\"#csharp-setup\">C# Setup<\/a><\/li>\n<\/ol>\n<h3>Beginner<\/h3>\n<ol>\n<li><a href=\"#lexers-and-parser\">Lexers and Parsers<\/a><\/li>\n<li><a href=\"#creating-a-grammar\">Creating a Grammar<\/a><\/li>\n<li><a href=\"#designing-a-data-format\">Designing a Data Format<\/a><\/li>\n<li><a href=\"#lexer-rules\">Lexer Rules<\/a><\/li>\n<li><a href=\"#parser-rules\">Parser Rules<\/a><\/li>\n<li><a href=\"#mistakes-and-adjustements\">Mistakes and Adjustments<\/a><\/li>\n<\/ol>\n<h3>Mid-Level<\/h3>\n<ol>\n<li><a href=\"#setup-antlr-with-javascript\">Setting Up the Chat Project in Javascript<\/a><\/li>\n<li><a href=\"#antlr.js\">Antlr.js<\/a><\/li>\n<li><a href=\"#htmlchatlistener.js\">HtmlChatListener.js<\/a><\/li>\n<li><a href=\"#working-with-a-listener\">Working with\u00a0a Listener<\/a><\/li>\n<li><a href=\"#solving-ambiguities-with-semantic-predicates\">Solving Ambiguities with Semantic Predicates<\/a><\/li>\n<li><a href=\"#starting-with-python\">Continuing the Chat in Python<\/a><\/li>\n<li><a href=\"#the-python-way\">The Python Way of Working with a Listener<\/a><\/li>\n<li><a href=\"#testing-with-python\">Testing with Python<\/a><\/li>\n<li><a href=\"#parsing-markup\">Parsing Markup<\/a><\/li>\n<li><a href=\"#lexical-modes\">Lexical Modes<\/a><\/li>\n<li><a href=\"#parser-grammars\">Parser Grammars<\/a><\/li>\n<\/ol>\n<h3>Advanced<\/h3>\n<ol>\n<li><a href=\"#setup-java-for-antlr\">The Markup Project in Java<\/a><\/li>\n<li><a href=\"#the-main-app.java\">The Main App.java<\/a><\/li>\n<li><a href=\"#transforming-code-with-antlr\">Transforming Code with ANTLR<\/a><\/li>\n<li><a href=\"#joy-and-pain-of-transforming-code\">Joy and Pain of Transforming Code<\/a><\/li>\n<li><a href=\"#advanced-testing\">Advanced Testing<\/a><\/li>\n<li><a href=\"#dealing-with-expressions\">Dealing with Expressions<\/a><\/li>\n<li><a href=\"#parsing-spreadsheets\">Parsing Spreadsheets<\/a><\/li>\n<li><a href=\"#setup-csharp\">The Spreadsheet Project in C#<\/a><\/li>\n<li><a href=\"#excel-is-doomed\">Excel is Doomed<\/a><\/li>\n<li><a href=\"#testing-everything\">Testing Everything<\/a><\/li>\n<\/ol>\n<h3>Final Remarks<\/h3>\n<ol>\n<li><a href=\"#tips-and-tricks\">Tips and Tricks<\/a><\/li>\n<li><a href=\"#conclusions\">Conclusions<\/a><\/li>\n<\/ol>\n<h2>Setup<\/h2>\n<p>In this section we prepare our development environment to work with ANTLR: the parser generator tool, the supporting tools and the runtimes for each language.<\/p>\n<h2 id=\"setup-antlr\">1.Setup ANTLR<\/h2>\n<p>ANTLR is actually made up of two main parts: the tool, used to generate the lexer and parser, and the runtime, needed to run them.<\/p>\n<p>The tool will be needed just by you, the language engineer, while the runtime will be included in the final software using your language.<\/p>\n<p>The tool is always the same no matter which language you are targeting: it\u2019s a Java program that you need on your development machine. While the runtime is different for every language and must be available both to the developer and to the user.<\/p>\n<p>The only requirement for the tool is that you have installed at least <strong>Java 1.7<\/strong>. To install the Java program you need to download the last version from the official site, which at the moment is:<\/p>\n<pre class=\"brush:bash\">http:\/\/www.antlr.org\/download\/antlr-4.6-complete.jar<\/pre>\n<h3>Instructions<\/h3>\n<ol>\n<li>copy the downloaded tool where you usually put third-party java libraries (ex. <code>\/usr\/local\/lib <\/code>or<code>C:\\Program Files\\Java\\lib<\/code>)<\/li>\n<li>add the tool to your<code> CLASSPATH<\/code>. Add it to your startup script (ex. <code>.bash_profile<\/code>)<\/li>\n<li>(optional) add also aliases to your startup script to simplify the usage of ANTLR<\/li>\n<\/ol>\n<h4>Executing the instructions on\u00a0Linux\/Mac OS<\/h4>\n<pre class=\"brush:java\">\/\/ 1.\r\nsudo cp antlr-4.6-complete.jar \/usr\/local\/lib\/\r\n\/\/ 2. and 3.\r\n\/\/ add this to your .bash_profile\r\nexport CLASSPATH=\".:\/usr\/local\/lib\/antlr-4.6-complete.jar:$CLASSPATH\"\r\n\/\/ simplify the use of the tool to generate lexer and parser\r\nalias antlr4='java -Xmx500M -cp \"\/usr\/local\/lib\/antlr-4.6-complete.jar:$CLASSPATH\" org.antlr.v4.Tool'\r\n\/\/ simplify the use of the tool to test the generated code\r\nalias grun='java org.antlr.v4.gui.TestRig'<\/pre>\n<h4>Executing the instructions on Windows<\/h4>\n<pre class=\"brush:java\">\/\/ 1.\r\nGo to System Properties dialog &gt; Environment variables\r\n-&gt; Create or append to the CLASSPATH variable\r\n\/\/ 2. and 3. Option A: use doskey\r\ndoskey antlr4=java org.antlr.v4.Tool $*\r\ndoskey grun =java org.antlr.v4.gui.TestRig $*\r\n\/\/ 2. and 3. Option B: use batch files\r\n\/\/ create antlr4.bat\r\njava org.antlr.v4.Tool %*\r\n\/\/ create grun.bat\r\njava org.antlr.v4.gui.TestRig %*\r\n\/\/ put them in the system path or any of the directories included in %path%<\/pre>\n<h3>Typical Workflow<\/h3>\n<p>When you use ANTLR you start by writing a <em>grammar<\/em>, a file with extension <code>.g4<\/code> which contains the rules of the language that you are analyzing. You then use the <code>antlr4<\/code> program to generate the files that your program will actually use, such as the lexer and the parser.<\/p>\n<pre class=\"brush:java\">antlr4 &lt;options&gt; &lt;grammar-file-g4&gt;<\/pre>\n<p>There are a couple of important options you can specify when running <code>antlr4<\/code>.<\/p>\n<p>First, you can specify the target language, to generate a parser in Python or JavaScript or any other target different from Java (which is the default one). The other ones are used to generate visitor and listener (don\u2019t worry if you don\u2019t know what these are, we are going to explain it later).<\/p>\n<p>By default only the listener is generated, so to create the visitor you use the\u00a0<code>-visitor<\/code> command line option, and <code>-no-listener<\/code> if you don\u2019t want to generate the listener. There are also the opposite\u00a0options, <code>-no-visitor<\/code> and <code>-listener<\/code>, but they are the default values.<\/p>\n<pre class=\"brush:java\">antlr4 -visitor &lt;Grammar-file&gt;<\/pre>\n<p>You can optiofi test\u00a0your grammar using a little utility named <code>TestRig (<\/code>although, as we have seen, it\u2019s usually aliased to <code>grun<\/code>).<\/p>\n<pre class=\"brush:java\">grun &lt;grammar-name&gt; &lt;rule-to-test&gt; &lt;input-filename(s)&gt;<\/pre>\n<p>The filename(s) are optional and you can instead analyze the input that you type on the console.<\/p>\n<p>If you want to use the testing tool you need to generate a Java parser, even if your program is written in another language. This can be done just by selecting a different option with <code>antlr4<\/code>.<\/p>\n<p>Grun is useful when testing manually the first draft of your grammar. As it becomes more stable you may want to relay on automated\u00a0tests (we will see how to write them).<\/p>\n<p><code>Grun<\/code> also has a few useful options: <code>-tokens<\/code>, to shows the tokens detected,\u00a0 <code>-gui<\/code> to generate an image of the AST.<\/p>\n<h2 id=\"javascript-setup\">2. Javascript Setup<\/h2>\n<p>You can put your grammars in the same folder as\u00a0your Javascript files. The file containing the grammar must have the same name of the grammar, which must be declared at the top of the file.<\/p>\n<p>In the following example the name is <code>Chat<\/code> and the file is <code>Chat.g4<\/code>.<\/p>\n<p>We can create the corresponding Javascript parser simply by specifying the correct option with the ANTLR4 Java program.<\/p>\n<pre class=\"brush:java\">antlr4 -Dlanguage=JavaScript Chat.g4<\/pre>\n<p>Notice that the option is case-sensitive, so pay attention to the uppercase \u2018S\u2019. If you make a mistake you will receive a message like the following.<\/p>\n<pre class=\"brush:java\">error(31):  ANTLR cannot generate Javascript code as of version 4.6<\/pre>\n<p>ANTLR can be used both with <code>node.js<\/code> and in the browser. For the browser you need to use <code>webpack<\/code> or <code>require.js<\/code>. If you don\u2019t know how to use either of the two you can look at the <a href=\"https:\/\/github.com\/antlr\/antlr4\/blob\/master\/doc\/javascript-target.md\">official documentation for some help<\/a>\u00a0or read this tutorial on <a href=\"https:\/\/tomassetti.me\/antlr-and-the-web\/\">antlr in the web<\/a>. We are going to use <code>node.js<\/code>, for which you can install the ANTLR runtime simply by using the following standard command.<\/p>\n<pre class=\"brush:java\">npm install antlr4<\/pre>\n<h2 id=\"python-setup\">3. Python Setup<\/h2>\n<p>When you have a grammar you put that in the same folder as\u00a0your Python files. The file must have the same name of the grammar, which must be declared at the top of the file. In the following example the name is <code>Chat<\/code> and the file is <code>Chat.g4<\/code>.<\/p>\n<p>We can create the corresponding Python parser simply by specifying the correct option with the ANTLR4 Java program. For Python, you also need to pay attention to the version of Python, 2 or 3.<\/p>\n<pre class=\"brush:bash\">antlr4 -Dlanguage=Python3 Chat.g4<\/pre>\n<p>The runtime is available from PyPi so you just can install it using pio.<\/p>\n<pre class=\"brush:bash\">pip install antlr4-python3-runtime<\/pre>\n<p>Again, you just have to remember to specify the proper python version.<\/p>\n<h2 id=\"java-setup\">4. Java Setup<\/h2>\n<p>To setup our Java project using ANTLR you can do things manually. Or you can be a civilized person and use Gradle or Maven.<\/p>\n<p>Also, you can look in ANTLR plugins for your IDE.<\/p>\n<h3>4.1 Java Setup using Gradle<\/h3>\n<p>This is how I typically setup my Gradle project.<\/p>\n<p>I use a Gradle plugin to invoke ANTLR and I also use the IDEA plugin to generate the configuration for IntelliJ IDEA.<\/p>\n<pre class=\"brush:java\">dependencies {\r\n  antlr \"org.antlr:antlr4:4.5.1\"\r\n  compile \"org.antlr:antlr4-runtime:4.5.1\"\r\n  testCompile 'junit:junit:4.12'\r\n}\r\n \r\ngenerateGrammarSource {\r\n    maxHeapSize = \"64m\"\r\n    arguments += ['-package', 'me.tomassetti.mylanguage']\r\n    outputDirectory = new File(\"generated-src\/antlr\/main\/me\/tomassetti\/mylanguage\".toString())\r\n}\r\ncompileJava.dependsOn generateGrammarSource\r\nsourceSets {\r\n    generated {\r\n        java.srcDir 'generated-src\/antlr\/main\/'\r\n    }\r\n}\r\ncompileJava.source sourceSets.generated.java, sourceSets.main.java\r\n \r\nclean{\r\n    delete \"generated-src\"\r\n}\r\n \r\nidea {\r\n    module {\r\n        sourceDirs += file(\"generated-src\/antlr\/main\")\r\n    }\r\n}<\/pre>\n<p>I put my grammars under <em>src\/main\/antlr\/<\/em> and the gradle configuration make sure they are generated in the directory corresponding to their package. For example, if I want the parser to be in the package <em>me.tomassetti.mylanguage<\/em> it has to be generated into <em>generated-src\/antlr\/main\/me\/tomassetti\/mylanguage<\/em>.<\/p>\n<p>At this point I can simply run:<\/p>\n<pre class=\"brush:java\"># Linux\/Mac\r\n.\/gradlew generateGrammarSource\r\n \r\n# Windows\r\ngradlew generateGrammarSource<\/pre>\n<p>And I get my lexer &amp; parser generated from my grammar(s).<\/p>\n<p>Then I can also run:<\/p>\n<pre class=\"brush:java\"># Linux\/Mac\r\n.\/gradlew idea\r\n \r\n# Windows\r\ngradlew idea<\/pre>\n<p>And I have an IDEA Project ready to be opened.<\/p>\n<h3>4.2 Java Setup using\u00a0Maven<\/h3>\n<p>First of all we are going to specify in our POM that we need <code>antlr4-runtime<\/code> as a dependency. We will also use a Maven plugin to run ANTLR through Maven.<\/p>\n<p>We can also specify if we ANTLR to generate visitors or listeners. To do that we define a couple of corresponding properties.<\/p>\n<pre class=\"brush:xml\">&lt;project xmlns=\"http:\/\/maven.apache.org\/POM\/4.0.0\" xmlns:xsi=\"http:\/\/www.w3.org\/2001\/XMLSchema-instance\"\r\n  xsi:schemaLocation=\"http:\/\/maven.apache.org\/POM\/4.0.0 http:\/\/maven.apache.org\/xsd\/maven-4.0.0.xsd\"&gt;\r\n  &lt;modelVersion&gt;4.0.0&lt;\/modelVersion&gt;\r\n \r\n  [..]\r\n \r\n  &lt;properties&gt;\r\n    &lt;project.build.sourceEncoding&gt;UTF-8&lt;\/project.build.sourceEncoding&gt;\r\n    &lt;antlr4.visitor&gt;true&lt;\/antlr4.visitor&gt;\r\n    &lt;antlr4.listener&gt;true&lt;\/antlr4.listener&gt;\r\n  &lt;\/properties&gt;  \r\n \r\n  &lt;dependencies&gt;\r\n    &lt;dependency&gt;\r\n      &lt;groupId&gt;org.antlr&lt;\/groupId&gt;\r\n      &lt;artifactId&gt;antlr4-runtime&lt;\/artifactId&gt;\r\n      &lt;version&gt;4.6&lt;\/version&gt;\r\n    &lt;\/dependency&gt;\r\n   [..]\r\n  &lt;\/dependencies&gt;\r\n \r\n  &lt;build&gt;\r\n    &lt;plugins&gt;\r\n      [..]\r\n      &lt;!-- Plugin to compile the g4 files ahead of the java files\r\n           See https:\/\/github.com\/antlr\/antlr4\/blob\/master\/antlr4-maven-plugin\/src\/site\/apt\/examples\/simple.apt.vm\r\n           Except that the grammar does not need to contain the package declaration as stated in the documentation (I do not know why)\r\n           To use this plugin, type:\r\n             mvn antlr4:antlr4\r\n           In any case, Maven will invoke this plugin before the Java source is compiled\r\n        --&gt;\r\n      &lt;plugin&gt;\r\n        &lt;groupId&gt;org.antlr&lt;\/groupId&gt;\r\n        &lt;artifactId&gt;antlr4-maven-plugin&lt;\/artifactId&gt;\r\n        &lt;version&gt;4.6&lt;\/version&gt;                \r\n        &lt;executions&gt;\r\n          &lt;execution&gt;\r\n            &lt;goals&gt;\r\n              &lt;goal&gt;antlr4&lt;\/goal&gt;\r\n            &lt;\/goals&gt;            \r\n          &lt;\/execution&gt;\r\n        &lt;\/executions&gt;\r\n      &lt;\/plugin&gt;\r\n      [..]\r\n    &lt;\/plugins&gt;\r\n  &lt;\/build&gt;\r\n&lt;\/project&gt;<\/pre>\n<p>Now you have to put the *.g4 files of your grammar under\u00a0<code>src\/main\/antlr4\/me\/tomassetti\/examples\/MarkupParser.<\/code><\/p>\n<p>Once you have written your grammars you just run <code>mvn package<\/code> and all the magic happens: ANTLR is invoked, it generates the lexer and the parser and those are compiled together with the rest of your code.<\/p>\n<pre class=\"brush:java\">\/\/ use mwn to generate the package\r\nmvn package<\/pre>\n<p>If you have never used Maven you can look at the official <a href=\"https:\/\/github.com\/antlr\/antlr4\/blob\/master\/doc\/java-target.md\">ANTLR documentation for the Java target<\/a> or also the <a href=\"https:\/\/maven.apache.org\/\">Maven website<\/a> to get you started.<\/p>\n<p>There is a clear advantage in using Java for developing ANTLR grammars: there are plugins for several IDEs and it\u2019s the language that the main developer of the tool actually works on. So they are tools, like the <code>org.antlr.v4.gui.TestRig<\/code>, that can be easily integrated in you workflow and are useful if you want to easily visualize the AST of an input.<\/p>\n<h2 id=\"csharp-setup\">5. C# Setup<\/h2>\n<p>There is support for .NET Framework and Mono 3.5, but there is no support for .NET core. We are going to use Visual Studio to create our ANTLR project, because there is a nice extension for Visual Studio created by the same author of the C# target, called <em>ANTLR Language Support<\/em>. You can install it\u00a0by going in Tools -&gt; Extensions and Updates. This extension will automatically generate parser, lexer and visitor\/listener when you build your project.<\/p>\n<p>Furthermore, the extension will allow you to create a new grammar file, using the well known menu to add a new item. Last, but not least, you can setup the options to generate listener\/visitor right in the properties of each grammar file.<\/p>\n<p>Alternatively, if you prefer to use an editor, you need to use the usual\u00a0Java tool to generate everything. You can do that just by indicating the right language. In this example the grammar is called \u201cSpreadsheet\u201d.<\/p>\n<pre class=\"brush:java\">antlr4 -Dlanguage=CSharp Spreadsheet.g4<\/pre>\n<p>Notice that the \u2018S\u2019 in CSharp is uppercase.<\/p>\n<p>You still need the ANTLR4 runtime for your project, and you can install it with the good ol\u2019 <strong>nuget<\/strong>.<\/p>\n<h2>Beginner<\/h2>\n<p>In this section we lay the foundation you need to use ANTLR: what lexer and parsers are, the syntax to define them in a grammar and the strategies you can use to create one. We also see the first examples to show how to use what you have learned.\u00a0 You can come back to this section if you don\u2019t remember how ANTLR works.<\/p>\n<h2 id=\"lexers-and-parser\">6. Lexers and Parsers<\/h2>\n<p>Before looking into\u00a0parsers, we need to first\u00a0to look into lexers, also known as tokenizers. They are basically the first stepping stone toward a parser, and of course ANTLR allows you to build them too. <strong>A lexer takes the individual characters and transforms them in tokens<\/strong>, the atoms that the parser uses to create the logical structure.<\/p>\n<p>Imagine this process applied to a natural language such as English. You are reading the single characters, putting them together until they make a word, and then you combine the different words to form a sentence.<\/p>\n<p>Let\u2019s look at the following example and imagine that we are trying to parse a mathematical operation.<\/p>\n<pre class=\"brush:bash\">437 + 734<\/pre>\n<p>The lexer scans the text and find \u20184\u2019, \u20183\u2019, \u20187\u2019 and then the space \u2018 \u2018. So it knows that the first characters actually represent a number. Then it finds a \u2018+\u2019 symbol, so it knows that it represents an operator, and lastly it finds another number.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center.png\"><img decoding=\"async\" class=\"aligncenter wp-image-64417\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center.png\" width=\"860\" height=\"156\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center.png 1030w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center-300x54.png 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center-768x139.png 768w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/lexer-parser-center-1024x186.png 1024w\" sizes=\"(max-width: 860px) 100vw, 860px\" \/><\/a><\/p>\n<p>How does it knows that? Because we tell it.<\/p>\n<pre class=\"brush:bash\">\/*\r\n * Parser Rules\r\n *\/\r\n \r\noperation  : NUMBER '+' NUMBER ;\r\n \r\n\/*\r\n * Lexer Rules\r\n *\/\r\n \r\nNUMBER     : [0-9]+ ;\r\n \r\nWHITESPACE : ' ' -&gt; skip ;<\/pre>\n<p>This is not a complete grammar, but we can already see that lexer rules are all uppercase, while parser rules are all lowercase. Technically the rule about case applies only to the first character of their names, but usually they are all uppercase or lowercase for clarity.<\/p>\n<p>Rules are typically written in this order: first the\u00a0parser rules and then the lexer ones, although logically they are applied in the opposite order. It\u2019s also important to remember that <strong>lexer rules are analyzed in the order that they appear<\/strong>, and they can be ambiguous.<\/p>\n<p>The typical example is the identifier: in many programming language it can be any string of letters, but certain combinations, such as \u201cclass\u201d or \u201cfunction\u201d are forbidden because they indicate a <em>class<\/em> or a <em>function<\/em>. So the order of the rules solves the ambiguity by using the first match and that\u2019s why the tokens identifying keywords such\u00a0as <em>class<\/em> or <em>function<\/em> are defined first, while the one for the identifier is put last.<\/p>\n<blockquote>\n<p>The basic syntax of a rule is easy: <strong>there is a name, a colon, the definition of the rule and a terminating semicolon<\/strong><\/p>\n<\/blockquote>\n<p>The definition of <strong>NUMBER<\/strong> contains a typical range of digits and a \u2018+\u2019 symbol to indicate that one or more matches are allowed. These are all very typical indications with which I assume you are familiar with, if not, you can read more about the syntax of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression#Syntax\">regular expressions<\/a>.<\/p>\n<p>The most interesting part is at the end, the lexer rule that defines the <strong>WHITESPACE<\/strong> token. It\u2019s interesting because it shows how to indicate to ANTLR to ignore something. Consider how ignoring whitespace simplify parser rules: if we couldn\u2019t say to ignore WHITESPACE we would have to\u00a0include it between every single subrule of the parser, to let the user puts spaces where he wants. Like this:<\/p>\n<pre class=\"brush:bash\">operation  : WHITESPACE* NUMBER WHITESPACE* '+' WHITESPACE* NUMBER;<\/pre>\n<p>And the same typically applies to comments: they can appear everywhere and we do not want to handle them specifically in every single piece of our grammar so we just ignore them (at least while parsing) .<\/p>\n<h2 id=\"creating-a-grammar\">7. Creating a Grammar<\/h2>\n<p>Now that we have seen the basic syntax of a rule, we can take a look at the two different approaches to define a grammar: top-down and bottom-up.<\/p>\n<h3>Top-down approach<\/h3>\n<p>This approach consist in starting from the general organization of a file written in your language.<\/p>\n<p>What are the main section of a file? What is their order? What is contained in each section?<\/p>\n<p>For example a Java file can be divided in three sections:<\/p>\n<ul>\n<li>package declaration<\/li>\n<li>imports<\/li>\n<li>type definitions<\/li>\n<\/ul>\n<p>This approach works best when you already know the language or format that you are designing a grammar for. It is probably the strategy preferred by people with a good theoretical background or people who prefer to start with \u201cthe big plan\u201d.<\/p>\n<p>When using this approach you start by defining the rule representing the whole file. It will probably include other rules, to represent the main sections. You then define those rules and you move from the most general, abstract rules to the low-level, practical ones.<\/p>\n<h3>Bottom-up approach<\/h3>\n<p>The bottom-up approach consists in focusing in the small elements first: defining how the tokens are captured, how the basic expressions are defined and so on. Then we move to higher level constructs until we define the rule representing the whole file.<\/p>\n<p>I personally prefer to start from the bottom, the basic items, that are analyzed with the lexer. And then you grow naturally from there to the structure, that is dealt with the parser. This approach permits to focus on a small piece of the grammar, build thests for that, ensure it works as expected and then move on to the next bit.<\/p>\n<p>This approach mimic the way we learn. Furthermore, there is the advantage of starting with real code that is actually quite common among many languages. In fact, most languages have things like identifiers, comments, whitespace, etc. Obviously you might have to tweak something, for example a comment in HTML is functionally the same as a comment in C#, but it has different delimiters.<\/p>\n<p>The disadvantage of a bottom-up approach rests on the fact that the parser is the thing you actually cares about. You weren\u2019t asked to build a lexer, you were asked to build a parser, that could provide a specific functionality. So by starting on the last part, the lexer, you might end up doing some refactoring, if you don\u2019t already know how the rest of the program will work.<\/p>\n<h2>8. Designing a Data Format<\/h2>\n<p>Designing a grammar for a new language is\u00a0difficult. You have to create a language simple and intuitive to the user, but also unambiguous to make the grammar manageable. It must be concise, clear, natural and it shouldn\u2019t get in the way of the user.<\/p>\n<p>So we are starting with something limited: a grammar for a simple chat program.<\/p>\n<p>Let\u2019s\u00a0start with a better description of our objective:<\/p>\n<ul>\n<li>there are not going to be paragraphs, and thus we can use newlines as separators between the messages<\/li>\n<li>we want to allow emoticons, mentions and links. We are not going to support HTML tags<\/li>\n<li>since our chat is going to be for annoying teenagers, we want to allow users an easy way to SHOUT and to format the color of the text.<\/li>\n<\/ul>\n<p>Finally teenagers could shout, and all in pink. What a time to be alive.<\/p>\n<h2 id=\"lexer-rules\">9. Lexer Rules<\/h2>\n<p>We start with defining lexer rules for our chat language. Remember that lexer rules actually are at the end of the files.<\/p>\n<pre class=\"brush:bash\">\/*\r\n * Lexer Rules\r\n *\/\r\n \r\nfragment A          : ('A'|'a') ;\r\nfragment S          : ('S'|'s') ;\r\nfragment Y          : ('Y'|'y') ;\r\nfragment H          : ('H'|'h') ;\r\nfragment O          : ('O'|'o') ;\r\nfragment U          : ('U'|'u') ;\r\nfragment T          : ('T'|'t') ;\r\n \r\nfragment LOWERCASE  : [a-z] ;\r\nfragment UPPERCASE  : [A-Z] ;\r\n \r\nSAYS                : S A Y S ;\r\n \r\nSHOUTS              : S H O U T S;\r\n \r\nWORD                : (LOWERCASE | UPPERCASE | '_')+ ;\r\n \r\nWHITESPACE          : (' ' | '\\t') ;\r\n \r\nNEWLINE             : ('\\r'? '\\n' | '\\r')+ ;\r\n \r\nTEXT                : ~[\\])]+ ;<\/pre>\n<p>In this example we use rules <strong>fragments<\/strong>: they are reusable building blocks\u00a0for lexer rules. You define them and then you refer to them in lexer rule. If you define them but do not include them in lexer rules they have simply no effect.<\/p>\n<p>We define a fragment for the letters we want to use in keywords. Why is that? because we want to support case-insensitive keywords. Other than to avoid repetition of the case of characters, they are also used when dealing with floating numbers. To avoid repeating digits, before and after the dot\/comma. Such as in the following example.<\/p>\n<pre class=\"brush:bash\">fragment DIGIT : [0-9] ;\r\nNUMBER         : DIGIT+ ([.,] DIGIT+)? ;<\/pre>\n<p>The <strong>TEXT<\/strong> token shows how to capture everything,\u00a0except for the characters the follows the tilde (\u2018~\u2019). We are excluding the closing square bracket \u2018]\u2019, but since it is a character used to identify the end of a group of characters, we have to escape it by prefixing it with a backslash \u2018\\\u2019.<\/p>\n<p>The newlines rule is formulated that way because there are actually different ways in which operating systems indicate a newline, some include a <code>carriage return ('\\r')<\/code> others a <code>newline ('\\n')<\/code> character, or a combination of the two.<\/p>\n<h2 id=\"parser-rules\">10. Parser Rules<\/h2>\n<p>We continue with parser rules, which are the rules with which our program will interact most directly.<\/p>\n<pre class=\"brush:bash\">\/*\r\n * Parser Rules\r\n *\/\r\n \r\nchat                : line+ EOF ;\r\n \r\nline                : name command message NEWLINE;\r\n \r\nmessage             : (emoticon | link | color | mention | WORD | WHITESPACE)+ ;\r\n \r\nname                : WORD ;\r\n \r\ncommand             : (SAYS | SHOUTS) ':' WHITESPACE ;\r\n                                        \r\nemoticon            : ':' '-'? ')'\r\n                    | ':' '-'? '('\r\n                    ;\r\n \r\nlink                : '[' TEXT ']' '(' TEXT ')' ;\r\n \r\ncolor               : '\/' WORD '\/' message '\/';\r\n \r\nmention             : '@' WORD ;<\/pre>\n<p>The first interesting part is <strong>message<\/strong>, not so much for what it contains, but the structure it represents. We are saying that a <code>message<\/code> could be anything of the listed rules in any order. This is a simple way to solve the problem of dealing with whitespace without repeating it every time. Since we, as\u00a0users,\u00a0find whitespace irrelevant we see something like <code>WORD WORD mention<\/code>, but the parser actually sees <code>WORD WHITESPACE WORD WHITESPACE mention WHITESPACE<\/code>.<\/p>\n<p>Another way of dealing with whitespace, when you can\u2019t get rid of it, is more advanced: lexical modes. Basically it allows you to specify two lexer parts: one for the structured part, the other for simple text. This is useful for parsing things like XML or HTML. We are going to show it later.<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p>The <strong>command<\/strong> rule it\u2019s obvious, you have just to notice that you cannot have a space between the two options for command and the colon, but you need one <strong>WHITESPACE <\/strong>after. The <strong>emoticon <\/strong>rule shows another notation to indicate multiple choices, you can use the pipe character \u2018|\u2019 without the parenthesis. We support only two emoticons, happy and sad, with or without the middle line.<\/p>\n<p>Something that could be considered a bug, or a poor implementation, is the <strong>link<\/strong> rule, as we already said, in fact, <strong>TEXT<\/strong> capture everything apart from certain special characters. You may want to only allows <strong>WORD<\/strong> and <strong>WHITESPACE,<\/strong> inside the parentheses, or to force a correct format for a link, inside the square brackets. On the other hand, this allows the user to make a mistake in writing the link without making the parser complain.<\/p>\n<blockquote>\n<p>You have to remember that the parser cannot check for semantics<\/p>\n<\/blockquote>\n<p>For instance, it cannot know if the <strong>WORD<\/strong> indicating the color actually represents a valid color. That is to say, it doesn\u2019t know that it\u2019s wrong to use \u201cdog\u201d, but it\u2019s right to use \u201cred\u201d. This must be checked by the logic of the program, that can access which colors are available. You have to find the right balance of dividing enforcement between the grammar and your own code.<\/p>\n<p>The parser should only check the syntax. So the rule of thumb is that when in doubt you let the parser pass the content up to your program. Then, in your program, you check the semantics and make sure that the rule actually have a proper meaning.<\/p>\n<p>Let\u2019s look at the rule <strong>color:<\/strong> it can include a <strong>message<\/strong>,\u00a0 and it itself can be part of <strong>message;<\/strong> this ambiguity will be solved by the context in which is used.<\/p>\n<h2 id=\"mistakes-and-adjustements\">11. Mistakes and Adjustments<\/h2>\n<p>Before trying our new grammar we have to add a name for it, at the beginning of the file. The name must be the same of the file, which should have the <code>.g4<\/code> extension.<\/p>\n<pre class=\"brush:bash\">grammar Chat;<\/pre>\n<p>You can find how to install everything, for your platform, in the <a href=\"http:\/\/www.antlr.org\/index.html\">official documentation<\/a>. After everything is installed, we create the grammar, compile the generate Java code and then we run the testing tool.<\/p>\n<pre class=\"brush:java\">\/\/ lines preceded by $ are commands\r\n\/\/ &gt; are input to the tool\r\n\/\/ - are output from the tool\r\n$ antlr4 Chat.g4\r\n$ javac Chat*.java\r\n\/\/ grun is the testing tool, Chat is the name of the grammar, chat the rule that we want to parse\r\n$ grun Chat chat\r\n&gt; john SAYS: hello @michael this will not work\r\n\/\/ CTRL+D on Linux, CTRL+Z on Windows\r\n&gt; CTRL+D\/CTRL+Z\r\n- line 1:0 mismatched input 'john SAYS: hello @michael this will not work\\n' expecting WORD<\/pre>\n<p>Okay, it doesn\u2019t work. Why is it expecting <strong>WORD<\/strong>? It\u2019s right there! Let\u2019s try to find out, using the option <code>-tokens<\/code> to make it shows the tokens it recognizes.<\/p>\n<pre class=\"brush:java\">$ grun Chat chat -tokens\r\n&gt; john SAYS: hello @michael this will not work\r\n- [@0,0:44='john SAYS: hello @michael this will not work\\n',&lt;TEXT&gt;,1:0]\r\n- [@1,45:44='&lt;EOF&gt;',&lt;EOF&gt;,2:0]<\/pre>\n<p>So it only sees the <strong>TEXT<\/strong> token. But we put it at the end of the grammar, what happens? The problem is that it always try to match the largest possible token. And all this text is a valid <strong>TEXT<\/strong> token. How we solve this problem? There are many ways, the first, of course, is just getting rid\u00a0of that token. But for now we are going to see the second easiest.<\/p>\n<pre class=\"brush:bash\">[..]\r\n \r\nlink                : TEXT TEXT ;\r\n \r\n[..]\r\n \r\nTEXT                : ('['|'(') ~[\\])]+ (']'|')');<\/pre>\n<p>We have changed the problematic token to make it include a preceding parenthesis or square bracket. Note that this isn\u2019t exactly the same thing, because it would allow two series of parenthesis or square brackets. But it is a first step and we are learning here, after all.<\/p>\n<p>Let\u2019s check if it works:<\/p>\n<pre class=\"brush:xml\">$ grun Chat chat -tokens\r\n&gt; john SAYS: hello @michael this will not work\r\n- [@0,0:3='john',&lt;WORD&gt;,1:0]\r\n- [@1,4:4=' ',&lt;WHITESPACE&gt;,1:4]\r\n- [@2,5:8='SAYS',&lt;SAYS&gt;,1:5]\r\n- [@3,9:9=':',&lt;':'&gt;,1:9]\r\n- [@4,10:10=' ',&lt;WHITESPACE&gt;,1:10]\r\n- [@5,11:15='hello',&lt;WORD&gt;,1:11]\r\n- [@6,16:16=' ',&lt;WHITESPACE&gt;,1:16]\r\n- [@7,17:17='@',&lt;'@'&gt;,1:17]\r\n- [@8,18:24='michael',&lt;WORD&gt;,1:18]\r\n- [@9,25:25=' ',&lt;WHITESPACE&gt;,1:25]\r\n- [@10,26:29='this',&lt;WORD&gt;,1:26]\r\n- [@11,30:30=' ',&lt;WHITESPACE&gt;,1:30]\r\n- [@12,31:34='will',&lt;WORD&gt;,1:31]\r\n- [@13,35:35=' ',&lt;WHITESPACE&gt;,1:35]\r\n- [@14,36:38='not',&lt;WORD&gt;,1:36]\r\n- [@15,39:39=' ',&lt;WHITESPACE&gt;,1:39]\r\n- [@16,40:43='work',&lt;WORD&gt;,1:40]\r\n- [@17,44:44='\\n',&lt;NEWLINE&gt;,1:44]\r\n- [@18,45:44='&lt;EOF&gt;',&lt;EOF&gt;,2:0]<\/pre>\n<p>Using the option <code>-gui<\/code> we can also have a nice, and easier to understand, graphical representation.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64418\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree.png\" alt=\"\" width=\"641\" height=\"235\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree.png 641w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree-300x110.png 300w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><\/a><\/p>\n<p>The dot in mid air represents whitespace.<\/p>\n<p>This works, but it isn\u2019t very smart or nice, or organized. But don\u2019t worry, later we are going to see a better way. One positive aspect of this solution is that it allows to show another trick.<\/p>\n<pre class=\"brush:bash\">TEXT                : ('['|'(') .*? (']'|')');<\/pre>\n<p>This is an equivalent formulation of the token <strong>TEXT<\/strong>: the \u2018.\u2019 matches any character, \u2018*\u2019 says that the preceding match can be repeated any time, \u2018?\u2019 indicate that the previous match is non-greedy. That is to say the previous subrule matches everything except what follows it, allowing to match the closing parenthesis or square bracket.<\/p>\n<h2>Mid-Level<\/h2>\n<p>In this section we see how to use ANTLR in your programs, the libraries and functions you need to use, how to tests your parsers, and the like. We see what is and how to use a listener. We also build up on our knowledge of the basics, by looking at more advanced concepts, such as semantic predicates. While our projects are mainly in Javascript and Python, the concept are generally applicable to every language. You can come back to this section when you need to remember how to get your project organized.<\/p>\n<h2 id=\"setup-antlr-with-javascript\">12. Setting Up the Chat Project with Javascript<\/h2>\n<p>In the previous sections we have seen how to build a grammar for a chat program , piece by piece. Let\u2019s now copy that grammar we just created in the same folder of our Javascript files.<\/p>\n<pre class=\"brush:java\">grammar Chat;\r\n \r\n\/*\r\n * Parser Rules\r\n *\/\r\n \r\nchat                : line+ EOF ;\r\n \r\nline                : name command message NEWLINE ;\r\n \r\nmessage             : (emoticon | link | color | mention | WORD | WHITESPACE)+ ;\r\n \r\nname                : WORD WHITESPACE;\r\n \r\ncommand             : (SAYS | SHOUTS) ':' WHITESPACE ;\r\n                                        \r\nemoticon            : ':' '-'? ')'\r\n                    | ':' '-'? '('\r\n                    ;\r\n \r\nlink                : TEXT TEXT ;\r\n \r\ncolor               : '\/' WORD '\/' message '\/';\r\n \r\nmention             : '@' WORD ;\r\n \r\n \r\n\/*\r\n * Lexer Rules\r\n *\/\r\n \r\nfragment A          : ('A'|'a') ;\r\nfragment S          : ('S'|'s') ;\r\nfragment Y          : ('Y'|'y') ;\r\nfragment H          : ('H'|'h') ;\r\nfragment O          : ('O'|'o') ;\r\nfragment U          : ('U'|'u') ;\r\nfragment T          : ('T'|'t') ;\r\n \r\nfragment LOWERCASE  : [a-z] ;\r\nfragment UPPERCASE  : [A-Z] ;\r\n \r\nSAYS                : S A Y S ;\r\n \r\nSHOUTS              : S H O U T S ;\r\n \r\nWORD                : (LOWERCASE | UPPERCASE | '_')+ ;\r\n \r\nWHITESPACE          : (' ' | '\\t')+ ;\r\n \r\nNEWLINE             : ('\\r'? '\\n' | '\\r')+ ;\r\n \r\nTEXT                : ('['|'(') ~[\\])]+ (']'|')');<\/pre>\n<p>We can create the corresponding Javascript parser simply by specifying the correct option with the ANTLR4 Java program.<\/p>\n<pre class=\"brush:bash\">antlr4 -Dlanguage=JavaScript Chat.g4<\/pre>\n<p>Now you will find some new files in the folder, with names such as <code>ChatLexer.js,<\/code> <code>ChatParser.js<\/code> and there are also *.tokens files, none of which contains anything interesting for us, unless you want to understand the inner workings of ANTLR.<\/p>\n<p>The file you want to look at is <code>ChatListener.js<\/code>,\u00a0 you are not going to modify anything in it, but it contains methods and functions that we will override with our own listener. We are not going to modify it, because changes would be overwritten every time the grammar is regenerated.<\/p>\n<p>Looking into it you can see several enter\/exit functions, a pair for each of our parser rules. These functions will be invoked when a piece of code matching the rule will be encountered. This is the default implementation of the listener that allows you to just override the functions that you need, on your derived listener, and leave the rest to be.<\/p>\n<pre class=\"brush:js\">var antlr4 = require('antlr4\/index');\r\n \r\n\/\/ This class defines a complete listener for a parse tree produced by ChatParser.\r\nfunction ChatListener() {\r\n    antlr4.tree.ParseTreeListener.call(this);\r\n    return this;\r\n}\r\n \r\nChatListener.prototype = Object.create(antlr4.tree.ParseTreeListener.prototype);\r\nChatListener.prototype.constructor = ChatListener;\r\n \r\n\/\/ Enter a parse tree produced by ChatParser#chat.\r\nChatListener.prototype.enterChat = function(ctx) {\r\n};\r\n \r\n\/\/ Exit a parse tree produced by ChatParser#chat.\r\nChatListener.prototype.exitChat = function(ctx) {\r\n};\r\n \r\n[..]<\/pre>\n<p>The alternative to creating a <code>Listener<\/code> is creating a <code>Visitor<\/code>. The main differences are that you can\u2019t neither control the flow of a listener nor returning anything from its functions, while you can do both of them with a visitor. So if you need to control how the nodes of the AST are entered, or to gather information from several of them, you probably want to use a visitor. This is useful, for example, with code generation, where some information that is needed to create new source code is spread around many parts. Both the listener and the visitor\u00a0use depth-first search.<\/p>\n<p>A depth-first search means that when a node will be accessed its children will be accessed, and if one the children nodes had its own children they will be accessed before continuing on with the other children of the first node. The following image will make it simpler to understand the concept.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_depth.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64419\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_depth.png\" alt=\"\" width=\"641\" height=\"235\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_depth.png 641w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_depth-300x110.png 300w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><\/a><\/p>\n<p>So in the case of a listener an enter event will be fired at the first encounter with the node and a exit one will be fired after after having exited all of its children. In the following image you can see the example of what functions will be fired when a listener would met a <strong>line<\/strong> node (for simplicity only the functions related to <strong>line<\/strong> are shown).<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_listener.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64420\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_listener.png\" alt=\"\" width=\"641\" height=\"235\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_listener.png 641w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_listener-300x110.png 300w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><\/a><\/p>\n<p>With a standard visitor the behavior will be analogous except, of course, that only\u00a0a single\u00a0visit event will be fired for every single node.\u00a0In the following image you can see the example of what function will be fired when a visitor would met a <strong>line<\/strong> node (for simplicity only the function related to <strong>line<\/strong> is shown).<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_visitor.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64421\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_visitor.png\" alt=\"\" width=\"641\" height=\"235\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_visitor.png 641w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_visitor-300x110.png 300w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><\/a><\/p>\n<p>Remember that <strong>this is true for the default implementation of a visitor and it\u2019s done by returning the children of each node in every function<\/strong>. If you override a method of the visitor it\u2019s your responsibility to make it continuing the journey or stop it right there.<\/p>\n<h2 id=\"antlr.js\">13. Antlr.js<\/h2>\n<p>It is finally time to see how a typical ANTLR program looks.<\/p>\n<pre class=\"brush:js\">const http = require('http');\r\nconst antlr4 = require('antlr4\/index');\r\nconst ChatLexer = require('.\/ChatLexer');\r\nconst ChatParser = require('.\/ChatParser');\r\nconst HtmlChatListener = require('.\/HtmlChatListener').HtmlChatListener;\r\n \r\nhttp.createServer((req, res) =&gt; {\r\n   \r\n   res.writeHead(200, {\r\n       'Content-Type': 'text\/html',        \r\n   });\r\n \r\n   res.write('&lt;html&gt;&lt;head&gt;&lt;meta charset=\"UTF-8\"\/&gt;&lt;\/head&gt;&lt;body&gt;');\r\n   \r\n   var input = \"john SHOUTS: hello @michael \/pink\/this will work\/ :-) \\n\";\r\n   var chars = new antlr4.InputStream(input);\r\n   var lexer = new ChatLexer.ChatLexer(chars);\r\n   var tokens  = new antlr4.CommonTokenStream(lexer);\r\n   var parser = new ChatParser.ChatParser(tokens);\r\n   parser.buildParseTrees = true;   \r\n   var tree = parser.chat();   \r\n   var htmlChat = new HtmlChatListener(res);\r\n   antlr4.tree.ParseTreeWalker.DEFAULT.walk(htmlChat, tree);\r\n   \r\n   res.write('&lt;\/body&gt;&lt;\/html&gt;');\r\n   res.end();\r\n \r\n}).listen(1337);<\/pre>\n<p>At the beginning of the main file we import (using <em>require<\/em>)\u00a0the necessary libraries and file, <code>antlr4<\/code>\u00a0(the runtime) and our generated parser, plus the listener that we are going to see later.<\/p>\n<p>For simplicity we get the input from a string, while in a real scenario it would come from an editor.<\/p>\n<blockquote>\n<p>Lines 16-19 shows the foundation of every ANTLR program: you create the stream of chars from the input, you give it to the lexer and it transforms them in tokens, that are then interpreted by the parser.<\/p>\n<\/blockquote>\n<p>It\u2019s useful to take a moment to reflect on this: the lexer works on the characters of the input, a copy of the input to be precise, while the parser works on the tokens generated by the parser. <strong>The lexer doesn\u2019t work on the input directly, and the parser doesn\u2019t even see the characters<\/strong>.<\/p>\n<p>This is important to remember in case you need to do something advanced like manipulating the input. In this case the input is a string, but, of course, it could be any stream of content.<\/p>\n<p>The line 20 is redundant, since the option already default to true, but that could change in future versions of the runtimes, so you are better off by specifying it.<\/p>\n<p>Then, on line 21, we set the root node of the tree as a <strong>chat<\/strong> rule. You want to invoke the parser specifying a rule which typically is the first rule. However you can actually invoke any rule directly, like <strong>color<\/strong>.<\/p>\n<p>Once we get the AST from the parser typically we want to process it using a listener or a visitor. In this case we specify a listener. Our particular listener take a parameter: the response object. We want to use it to put some text in the response to send to the user.\u00a0After setting the listener up, we finally walk the tree with our listener.<\/p>\n<h2 id=\"htmlchatlistener.js\">14. HtmlChatListener.js<\/h2>\n<p>We continue by looking at the listener of our <em>Chat<\/em> project.<\/p>\n<pre class=\"brush:js\">const antlr4 = require('antlr4\/index');\r\nconst ChatLexer = require('.\/ChatLexer');\r\nconst ChatParser = require('.\/ChatParser');\r\nvar ChatListener = require('.\/ChatListener').ChatListener;\r\n \r\nHtmlChatListener = function(res) {\r\n    this.Res = res;    \r\n    ChatListener.call(this); \/\/ inherit default listener\r\n    return this;\r\n};\r\n \r\n\/\/ inherit default listener\r\nHtmlChatListener.prototype = Object.create(ChatListener.prototype);\r\nHtmlChatListener.prototype.constructor = HtmlChatListener;\r\n \r\n\/\/ override default listener behavior\r\nHtmlChatListener.prototype.enterName = function(ctx) {          \r\n    this.Res.write(\"&lt;strong&gt;\");    \r\n};\r\n \r\nHtmlChatListener.prototype.exitName = function(ctx) {      \r\n    this.Res.write(ctx.WORD().getText());\r\n    this.Res.write(\"&lt;\/strong&gt; \");\r\n}; \r\n \r\nHtmlChatListener.prototype.exitEmoticon = function(ctx) {      \r\n    var emoticon = ctx.getText();        \r\n    \r\n    if(emoticon == ':-)' || emoticon == ':)')\r\n    {\r\n        this.Res.write(\"??\");        \r\n    }\r\n    \r\n    if(emoticon == ':-(' || emoticon == ':(')\r\n    {\r\n        this.Res.write(\"??\");            \r\n    }\r\n}; \r\n \r\nHtmlChatListener.prototype.enterCommand = function(ctx) {          \r\n    if(ctx.SAYS() != null)\r\n        this.Res.write(ctx.SAYS().getText() + ':' + '&lt;p&gt;');\r\n \r\n    if(ctx.SHOUTS() != null)\r\n        this.Res.write(ctx.SHOUTS().getText() + ':' + '&lt;p style=\"text-transform: uppercase\"&gt;');\r\n};\r\n \r\nHtmlChatListener.prototype.exitLine = function(ctx) {              \r\n    this.Res.write(\"&lt;\/p&gt;\");\r\n};\r\n \r\nexports.HtmlChatListener = HtmlChatListener;<\/pre>\n<p>After the requires function calls we make our <strong>HtmlChatListener<\/strong> to extend <strong>ChatListener. <\/strong>The interesting stuff starts at line 17.<\/p>\n<p>The <strong>ctx<\/strong> argument is an instance of a specific class context for the node that we are entering\/exiting. So for <code>enterName<\/code> is <code>NameContext<\/code>, for <code>exitEmoticon<\/code> is <code>EmoticonContext<\/code>, etc. This specific context will have the proper elements for the rule, that would make possible to easily access the respective tokens and subrules. For example, <code>NameContext<\/code> will contain fields like <strong>WORD()<\/strong> and <strong>WHITESPACE();<\/strong> <code>CommandContext <\/code>will contain fields like <strong>WHITESPACE()<\/strong>, <strong>SAYS()<\/strong> and <strong>SHOUTS().<\/strong><\/p>\n<blockquote>\n<p>These functions, <code>enter*<\/code> and <code>exit*,<\/code> are called by the walker everytime the corresponding nodes are entered or exited while it\u2019s traversing the AST that represents the program newline. A listener allows you to execute some code, but it\u2019s important to remember that<strong> you can\u2019t stop the execution of the walker and the execution of the functions<\/strong>.<\/p>\n<\/blockquote>\n<p>On line 18, we start by printing a <code>strong<\/code> tag because we want the name to be bold, then on exitName we take the text from the token <strong>WORD<\/strong> and close the tag. Note that we ignore the <strong>WHITESPACE<\/strong> token, nothing says that we have to show everything. In this case we could have done everything either on the enter or exit function.<\/p>\n<p>On the function <em>exitEmoticon<\/em> we simply transform the emoticon text in an emoji character. We get the text of the whole rule because there are no tokens defined for this parser rule. On <em>enterCommand<\/em>, instead there could be any of two tokens <strong>SAYS<\/strong> or <strong>SHOUTS<\/strong>, so we check which one is defined. And then we alter the following text, by transforming in uppercase, if it\u2019s a <strong>SHOUT. <\/strong>Note that we close the <code>p<\/code>\u00a0tag at the exit of the <strong>line<\/strong> rule, because the command, semantically speaking, alter all the text of the message.<\/p>\n<p>All we have to do now is launching node, with <code>nodejs antlr.js<\/code>, and point our browser at its address, usually at <code>http:\/\/localhost:1337\/<\/code> and we will be greeted with the following image.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_1.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64422\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_1.png\" alt=\"\" width=\"315\" height=\"167\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_1.png 315w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_1-300x160.png 300w\" sizes=\"(max-width: 315px) 100vw, 315px\" \/><\/a><\/p>\n<p>So all is good, we just have to add all the different listeners to handle the rest of the language. Let\u2019s start with <strong>color<\/strong> and <strong>message<\/strong>.<\/p>\n<h2 id=\"working-with-a-listener\">15. Working with\u00a0a Listener<\/h2>\n<p>We have seen how to start defining a listener. Now let\u2019s get serious on see how to evolve in a complete, robust listener. Let\u2019s start by adding support for <strong>color<\/strong> and checking the results of our hard work.<\/p>\n<pre class=\"brush:js\">HtmlChatListener.prototype.enterColor = function(ctx) {     \r\n    var color = ctx.WORD().getText();         \r\n    this.Res.write('&lt;span style=\"color: ' + color + '\"&gt;');        \r\n};\r\n \r\nHtmlChatListener.prototype.exitColor = function(ctx) {          \r\n    this.Res.write(\"&lt;\/span&gt;\");    \r\n}; \r\n \r\nHtmlChatListener.prototype.exitMessage = function(ctx) {             \r\n    this.Res.write(ctx.getText());\r\n};\r\n \r\nexports.HtmlChatListener = HtmlChatListener;\r\n<a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_2.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64423\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_2.png\" alt=\"\" width=\"570\" height=\"180\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_2.png 570w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/Istantanea_javascript_2-300x95.png 300w\" sizes=\"(max-width: 570px) 100vw, 570px\" \/>\r\n<\/a><\/pre>\n<p>Except that it doesn\u2019t work. Or maybe it works too much: we are writing some part of <strong>message<\/strong> twice (\u201cthis will work\u201d): first when we check the specific nodes, children of <strong>message<\/strong>, and then at the end.<\/p>\n<p>Luckily with Javascript we can dynamically alter objects, so we can take advantage of this fact to change the *Context object themselves.<\/p>\n<pre class=\"brush:js\">HtmlChatListener.prototype.exitColor = function(ctx) {         \r\n    ctx.text += ctx.message().text;    \r\n    ctx.text += '&lt;\/span&gt;';\r\n};\r\n \r\nHtmlChatListener.prototype.exitEmoticon = function(ctx) {      \r\n    var emoticon = ctx.getText();        \r\n    \r\n    if(emoticon == ':-)' || emoticon == ':)')\r\n    {        \r\n        ctx.text = \"??\";\r\n    }\r\n    \r\n    if(emoticon == ':-(' || emoticon == ':(')\r\n    {          \r\n        ctx.text = \"??\";\r\n    }\r\n}; \r\n \r\nHtmlChatListener.prototype.exitMessage = function(ctx) {                \r\n    var text = '';\r\n \r\n    for (var index = 0; index &lt;  ctx.children.length; index++ ) {\r\n        if(ctx.children[index].text != null)\r\n            text += ctx.children[index].text;\r\n        else\r\n            text += ctx.children[index].getText();\r\n    }\r\n \r\n    if(ctx.parentCtx instanceof ChatParser.ChatParser.LineContext == false)\r\n    {\r\n        ctx.text = text;        \r\n    }\r\n    else\r\n    {\r\n        this.Res.write(text);\r\n        this.Res.write(\"&lt;\/p&gt;\");\r\n    }\r\n};<\/pre>\n<p>Only the modified parts are shown in the snippet above. We add a <strong>text<\/strong> field to every node that transforms its text, and then at the exit of every <strong>message <\/strong>we print the text if it\u2019s the primary message, the one that is directly child of the <strong>line<\/strong> rule. If it\u2019s a message, that is also a child of color, we add the <strong>text <\/strong>field to the node we are exiting and let <strong>color<\/strong> print it. We check this on line 30, where we look at the parent node to see if it\u2019s an instance of the object <code>LineContext<\/code>. This is also further evidence of how each <strong>ctx<\/strong> argument corresponds to the proper type.<\/p>\n<p>Between lines 23 and 27 we can see another field of every node of the generated tree: <code>children<\/code>, which obviously it contains the children node. You can observe that if a field<strong> text<\/strong> exists we add it to the proper variable, otherwise we use the usual function to get the text of the node.<\/p>\n<h2 id=\"solving-ambiguities-with-semantic-predicates\">16. Solving Ambiguities with Semantic Predicates<\/h2>\n<p>So far we have seen how to build a parser for a chat language in Javascript. Let\u2019s continue working on this grammar but switch to python. Remember that all code is available in the <a href=\"https:\/\/github.com\/unosviluppatore\/antlr-mega-tutorial\">repository<\/a>. Before that, we have to solve an annoying problem: the <strong>TEXT<\/strong> token. The solution \u00a0we have \u00a0is terrible, and furthermore, if we tried to get the text of the token we would have to trim the edges, parentheses or square brackets. So what can we do?<\/p>\n<p>We can use a particular feature of ANTLR called <em>semantic predicates.<\/em> As the name implies they are expressions that produce a boolean value. They selectively enable or disable the following rule and thus permit to solve ambiguities. Another reason that they could be used is to support different version of the same language, for instance a version with a new construct or an old without it.<\/p>\n<p>Technically they are part of the larger group of <em>actions<\/em>, that allows to embed arbitrary code into the grammar. <strong>The downside is that the grammar is no more language independent<\/strong>, since the code in the action must be valid for the target language. For this reason, usually it\u2019s considered a good idea to only use semantic predicates, when they can\u2019t be avoided, and leave most of the code to the visitor\/listener.<\/p>\n<pre class=\"brush:bash\">link                : '[' TEXT ']' '(' TEXT ')';\r\n \r\nTEXT                : {self._input.LA(-1) == ord('[') or self._input.LA(-1) == ord('(')}? ~[\\])]+ ;<\/pre>\n<p>We restored <strong>link<\/strong> to its original formulation, but we added a semantic predicate to the <strong>TEXT<\/strong> token, written inside curly brackets and followed by a question mark. We use <code>self._input.LA(-1)<\/code> to check the character before the current one, if this character is a square bracket or the open parenthesis, we activate the <strong>TEXT<\/strong> token. It\u2019s important to repeat that this must be valid code in our target language, it\u2019s going to end up in the generated Lexer or Parser, in our case in <code>ChatLexer.py.<\/code><\/p>\n<p>This\u00a0matters not just for the syntax itself, but also because different targets might have different fields or methods, for instance <code>LA<\/code> returns an <code>int<\/code> in python, so we have to convert the <code>char<\/code> to a <code>int<\/code>.<\/p>\n<p>Let\u2019s look at the equivalent form in other languages.<\/p>\n<pre class=\"brush:bash\">\/\/ C#. Notice that is .La and not .LA\r\nTEXT : {_input.La(-1) == '[' || _input.La(-1) == '('}? ~[\\])]+ ;\r\n\/\/ Java\r\nTEXT : {_input.LA(-1) == '[' || _input.LA(-1) == '('}? ~[\\])]+ ;\r\n\/\/ Javascript\r\nTEXT : {this._input.LA(-1) == '[' || this._input.LA(-1) == '('}? ~[\\])]+ ;<\/pre>\n<p>If you want to test for the preceding token, you can use the <code>_input.LT(-1,)<\/code>but you can only do that for parser rules. For example, if you want to enable the <strong>mention<\/strong> rule only if preceded by a <strong>WHITESPACE<\/strong> token.<\/p>\n<pre class=\"brush:bash\">\/\/ C#\r\nmention: {_input.Lt(-1).Type == WHITESPACE}? '@' WORD ;\r\n\/\/ Java\r\nmention: {_input.LT(1).getType() == WHITESPACE}? '@' WORD ;\r\n\/\/ Python\r\nmention: {self._input.LT(-1).text == ' '}? '@' WORD ;\r\n\/\/ Javascript\r\nmention: {this._input.LT(1).text == ' '}? '@' WORD ;<\/pre>\n<h2 id=\"starting-with-python\">17. Continuing the Chat in Python<\/h2>\n<p>Before seeing the Python example, we must modify our grammar and put the <strong>TEXT<\/strong> token before the <strong>WORD<\/strong> one. Otherwise ANTLR might assign the incorrect token, in cases where the characters between parentheses or brackets are all valid for <strong>WORD<\/strong>, for instance if it where <code>[this](link)<\/code>.<\/p>\n<p>Using ANTLR in python is not more difficult than with any other platform, you just need to pay attention to the version of Python, 2 or 3.<\/p>\n<pre class=\"brush:bash\">antlr4 -Dlanguage=Python3 Chat.g4<\/pre>\n<p>And that\u2019s it. So when you have run the command, inside the directory of your python project, there will be a newly generated parser and a\u00a0lexer. You may find interesting to look at <code>ChatLexer.py<\/code> and in particular the function <code>TEXT_sempred <\/code>(sempred stands for <strong>sem<\/strong>antic <strong>pred<\/strong>icate).<\/p>\n<pre class=\"brush:java\">def TEXT_sempred(self, localctx:RuleContext, predIndex:int):\r\n    if predIndex == 0:\r\n        return self._input.LA(-1) == ord('[') or self._input.LA(-1) == ord('(')<\/pre>\n<p>You can see our predicate right in the code. This also means that you have to check that the correct libraries, for the functions used in the predicate,\u00a0are available to the lexer.<\/p>\n<h2 id=\"the-python-way\">18. The Python Way of Working with a Listener<\/h2>\n<p>The main file of a Python project is very similar to a Javascript one, <em>mutatis mutandis <\/em>of course. That is to say we have to adapt libraries and functions to the proper version for a different language.<\/p>\n<pre class=\"brush:java\">import sys\r\nfrom antlr4 import *\r\nfrom ChatLexer import ChatLexer\r\nfrom ChatParser import ChatParser\r\nfrom HtmlChatListener import HtmlChatListener\r\n \r\ndef main(argv):\r\n    input = FileStream(argv[1])\r\n    lexer = ChatLexer(input)\r\n    stream = CommonTokenStream(lexer)\r\n    parser = ChatParser(stream)\r\n    tree = parser.chat()\r\n \r\n    output = open(\"output.html\",\"w\")\r\n    \r\n    htmlChat = HtmlChatListener(output)\r\n    walker = ParseTreeWalker()\r\n    walker.walk(htmlChat, tree)\r\n        \r\n    output.close()      \r\n \r\nif __name__ == '__main__':\r\n    main(sys.argv)<\/pre>\n<p>We have also changed the input and output to become files, this avoid the need to launch a server in Python or the problem of using characters that are not supported in the terminal.<\/p>\n<pre class=\"brush:java\">import sys\r\nfrom antlr4 import *\r\nfrom ChatParser import ChatParser\r\nfrom ChatListener import ChatListener\r\n \r\nclass HtmlChatListener(ChatListener) :\r\n    def __init__(self, output):\r\n        self.output = output\r\n        self.output.write('&lt;html&gt;&lt;head&gt;&lt;meta charset=\"UTF-8\"\/&gt;&lt;\/head&gt;&lt;body&gt;')\r\n \r\n    def enterName(self, ctx:ChatParser.NameContext) :\r\n        self.output.write(\"&lt;strong&gt;\") \r\n \r\n    def exitName(self, ctx:ChatParser.NameContext) :\r\n        self.output.write(ctx.WORD().getText()) \r\n        self.output.write(\"&lt;\/strong&gt; \") \r\n \r\n    def enterColor(self, ctx:ChatParser.ColorContext) :\r\n        color = ctx.WORD().getText()\r\n        ctx.text = '&lt;span style=\"color: ' + color + '\"&gt;'        \r\n \r\n    def exitColor(self, ctx:ChatParser.ColorContext):         \r\n        ctx.text += ctx.message().text\r\n        ctx.text += '&lt;\/span&gt;'\r\n \r\n    def exitEmoticon(self, ctx:ChatParser.EmoticonContext) : \r\n        emoticon = ctx.getText()\r\n \r\n        if emoticon == ':-)' or emoticon == ':)' :\r\n            ctx.text = \"??\"\r\n    \r\n        if emoticon == ':-(' or emoticon == ':(' :\r\n            ctx.text = \"??\"\r\n \r\n    def enterLink(self, ctx:ChatParser.LinkContext):\r\n        ctx.text = '&lt;a href=\"%s\"&gt;%s&lt;\/a&gt;' % (ctx.TEXT()[1], (ctx.TEXT()[0]))\r\n \r\n    def exitMessage(self, ctx:ChatParser.MessageContext):\r\n        text = ''\r\n \r\n        for child in ctx.children:\r\n            if hasattr(child, 'text'):\r\n                text += child.text\r\n            else:\r\n                text += child.getText()\r\n        \r\n        if isinstance(ctx.parentCtx, ChatParser.LineContext) is False:\r\n            ctx.text = text\r\n        else:    \r\n            self.output.write(text)\r\n            self.output.write(\"&lt;\/p&gt;\") \r\n \r\n    def enterCommand(self, ctx:ChatParser.CommandContext):\r\n        if ctx.SAYS() is not None :\r\n            self.output.write(ctx.SAYS().getText() + ':' + '&lt;p&gt;')\r\n \r\n        if ctx.SHOUTS() is not None :\r\n            self.output.write(ctx.SHOUTS().getText() + ':' + '&lt;p style=\"text-transform: uppercase\"&gt;')    \r\n \r\n    def exitChat(self, ctx:ChatParser.ChatContext):\r\n        self.output.write(\"&lt;\/body&gt;&lt;\/html&gt;\")<\/pre>\n<p>Apart from lines 35-36, where we introduce support for links, there is nothing new. Though you might notice that Python syntax is cleaner and, while having dynamic typing, it is not loosely typed as Javascript. The different types of *Context objects are explicitly written out. If only Python tools were as easy to use as the language itself. But of course we cannot just fly over python like this, so we also introduce testing.<\/p>\n<h2 id=\"testing-with-python\">19. Testing with Python<\/h2>\n<p>While Visual Studio Code have a very nice extension for Python, that also supports unit testing, we are going to use the command line for the sake of compatibility.<\/p>\n<pre class=\"brush:java\">python3 -m unittest discover -s . -p ChatTests.py<\/pre>\n<p>That\u2019s how you run the tests, but before that we have to write them. Actually, even before that, we have to write an <code>ErrorListener<\/code> to manage errors that we could find. While we could simply read the text outputted by the default error listener, there is an advantage in using our own implementation, namely that we can control more easily what happens.<\/p>\n<pre class=\"brush:java\">import sys\r\nfrom antlr4 import *\r\nfrom ChatParser import ChatParser\r\nfrom ChatListener import ChatListener\r\nfrom antlr4.error.ErrorListener import *\r\nimport io\r\n \r\nclass ChatErrorListener(ErrorListener):\r\n \r\n    def __init__(self, output):\r\n        self.output = output        \r\n        self._symbol = ''\r\n    \r\n    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):        \r\n        self.output.write(msg)\r\n        self._symbol = offendingSymbol.text\r\n \r\n    @property        \r\n    def symbol(self):\r\n        return self._symbol<\/pre>\n<p>Our class derives from <code>ErrorListener<\/code> and we simply have to implement <code>syntaxError<\/code>. Although we also add a property <strong>symbol <\/strong>to easily check which symbol might have caused an error.<\/p>\n<pre class=\"brush:java\">from antlr4 import *\r\nfrom ChatLexer import ChatLexer\r\nfrom ChatParser import ChatParser\r\nfrom HtmlChatListener import HtmlChatListener\r\nfrom ChatErrorListener import ChatErrorListener\r\nimport unittest\r\nimport io\r\n \r\nclass TestChatParser(unittest.TestCase):\r\n \r\n    def setup(self, text):        \r\n        lexer = ChatLexer(InputStream(text))        \r\n        stream = CommonTokenStream(lexer)\r\n        parser = ChatParser(stream)\r\n        \r\n        self.output = io.StringIO()\r\n        self.error = io.StringIO()\r\n \r\n        parser.removeErrorListeners()        \r\n        errorListener = ChatErrorListener(self.error)\r\n        parser.addErrorListener(errorListener)  \r\n \r\n        self.errorListener = errorListener              \r\n        \r\n        return parser\r\n        \r\n    def test_valid_name(self):\r\n        parser = self.setup(\"John \")\r\n        tree = parser.name()               \r\n    \r\n        htmlChat = HtmlChatListener(self.output)\r\n        walker = ParseTreeWalker()\r\n        walker.walk(htmlChat, tree)              \r\n \r\n        # let's check that there aren't any symbols in errorListener         \r\n        self.assertEqual(len(self.errorListener.symbol), 0)\r\n \r\n    def test_invalid_name(self):\r\n        parser = self.setup(\"Joh-\")\r\n        tree = parser.name()               \r\n    \r\n        htmlChat = HtmlChatListener(self.output)\r\n        walker = ParseTreeWalker()\r\n        walker.walk(htmlChat, tree)              \r\n \r\n        # let's check the symbol in errorListener\r\n        self.assertEqual(self.errorListener.symbol, '-')\r\n \r\nif __name__ == '__main__':\r\n    unittest.main()<\/pre>\n<p>The <code>setup<\/code> method is used to ensure that everything is properly set; on lines 19-21 we setup also our <code>ChatErrorListener<\/code>, but first we remove the default one, otherwise it would still output errors on the standard output. We are listening to errors in the parser, but we could also catch errors generated by the lexer. It depends on what you want to test. You may want to check both.<\/p>\n<p>The two proper test methods checks for a valid and an invalid name. The checks are linked to the property <strong>symbol<\/strong>, that we have previously defined, if it\u2019s empty everything is fine, otherwise it contains the symbol that created the error. Notice that on line 28, there is a space at the end of the string, because we have defined the rule <strong>name <\/strong>to end with a <strong>WHITESPACE<\/strong> token.<\/p>\n<h2 id=\"parsing-markup\">20. Parsing Markup<\/h2>\n<p>ANTLR can parse many things, including binary data, in that case tokens are made up of non printable characters. But a more common problem is parsing markup languages such as XML or HTML. Markup is also a useful format to adopt for your own creations, because it allows to mix unstructured text content with structured annotations. They fundamentally represent a form of smart document, containing both text and structured data. The technical term that describe them is <em>island languages<\/em>. This type is not restricted to include only markup, and sometimes it\u2019s a matter of perspective.<\/p>\n<p>For example, you may have to build a parser that ignores preprocessor directives. In that case, you have to find a way to distinguish proper code from directives, which obeys different rules.<\/p>\n<p>In any case, the problem for parsing such languages is that there is a lot of text that we don\u2019t actually have to parse, but we cannot ignore or discard, because the text contain useful information for the user and it is a structural part of the document. The solution is <em>lexical modes<\/em>, a way to parse structured content inside a larger sea of free text.<\/p>\n<h2 id=\"lexical-modes\">21. Lexical Modes<\/h2>\n<p>We are going to see how to use lexical modes, by starting with a new grammar.<\/p>\n<pre class=\"brush:java\">lexer grammar MarkupLexer;\r\n \r\nOPEN                : '[' -&gt; pushMode(BBCODE) ;\r\nTEXT                : ~('[')+ ;\r\n \r\n\/\/ Parsing content inside tags\r\nmode BBCODE;\r\n \r\nCLOSE               : ']' -&gt; popMode ;\r\nSLASH               : '\/' ;\r\nEQUALS              : '=' ;\r\nSTRING              : '\"' .*? '\"' ;\r\nID                  : LETTERS+ ;\r\nWS                  : [ \\t\\r\\n] -&gt; skip ;\r\n \r\nfragment LETTERS    : [a-zA-Z] ;<\/pre>\n<p>Looking at the first line you could notice a difference: we are defining a <code>lexer grammar<\/code>, instead of the usual <code>(combined) grammar<\/code>. <strong>You simply can\u2019t define a lexical mode together with a parser grammar<\/strong>. You can use lexical modes only in a lexer grammar, not in a combined grammar. The rest is not suprising, as you can see, we are defining a sort of <a href=\"https:\/\/en.wikipedia.org\/wiki\/BBCode\">BBCode<\/a> markup, with tags delimited by square brackets.<\/p>\n<p>On lines 3, 7 and 9 you will find basically all that you need to know about lexical modes. You define one or more tokens that can delimit the different modes and activate them.<\/p>\n<p>The default mode is already implicitly defined, if you need to define yours you simply use <code>mode<\/code> followed by a name. Other than for markup languages, <em>lexical modes <\/em>are typically used to deal with string interpolation. When a string literal can contain more than simple text, but things like arbitrary expressions.<\/p>\n<p>When we used a combined grammar we could define tokens implicitly: when in a parser rule we used a string like \u2018=\u2019 that is what we did. Now that we are using separate lexer and parser grammars we cannot do that. That means that every single token has to be defined explicitly. So we have definitions like SLASH or EQUALS which typically could be just be directly used in a parser rule. The concept\u00a0is simple:\u00a0<strong>in the lexer grammar we need to define all tokens, because they cannot be defined later in the parser grammar.<\/strong><\/p>\n<h2 id=\"parser-grammars\">22. Parser Grammars<\/h2>\n<p>We look at the other side of a lexer grammar, so to speak.<\/p>\n<pre class=\"brush:bash\">parser grammar MarkupParser;\r\n \r\noptions { tokenVocab=MarkupLexer; }\r\n \r\nfile        : element* ;\r\n \r\nattribute   : ID '=' STRING ;\r\n \r\ncontent     : TEXT ;\r\n \r\nelement     : (content | tag) ;\r\n \r\ntag         : '[' ID attribute? ']' element* '[' '\/' ID ']' ;<\/pre>\n<p>On the first line we define a <code>parser grammar<\/code>. Since the tokens we need are defined in the lexer grammar, we need to use an option to say to ANTLR where it can find them. This is not necessary in combined grammars, since the tokens are defined in the same file.<\/p>\n<p>There are many other options available, in the <a href=\"https:\/\/github.com\/antlr\/antlr4\/blob\/master\/doc\/options.md\">documentation<\/a>.<\/p>\n<p>There is almost nothing else to add, except that we define a <strong>content<\/strong> rule so that we can manage more easily the text that we find later in the program.<\/p>\n<p>I just want to say that, as you can see, we don\u2019t need to explicitly use the tokens everytime (es. SLASH), but instead we can use the corresponding text (es. \u2018\/\u2019).<\/p>\n<p>ANTLR will automatically transform the text in the corresponding token, but this can happen only if they are already defined. In short, it is as if we had written:<\/p>\n<pre class=\"brush:bash\">tag : OPEN ID attribute? CLOSE element* OPEN SLASH ID CLOSE ;<\/pre>\n<p>But we could not have used the implicit way, if we hadn\u2019t already explicitly defined them in the lexer grammar. Another way to look at this is: when we define a combined grammar ANTLR defines for use all the tokens, that we have not explicitly defined ourselves. When we need to use a separate lexer and a parser grammar, we have to define explicitly every token ourselves. Once we have done that, we can use them in every way we want.<\/p>\n<p>Before moving to actual Java code, let\u2019s see the AST for a sample input.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_2.png\"><img decoding=\"async\" class=\"aligncenter wp-image-64425\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_2.png\" width=\"860\" height=\"181\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_2.png 939w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_2-300x63.png 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_2-768x162.png 768w\" sizes=\"(max-width: 860px) 100vw, 860px\" \/><\/a><\/p>\n<p>You can easily notice that the <strong>element<\/strong> rule is sort of transparent: where you would expect to find it there is always going to be a <strong>tag<\/strong> or <strong>content<\/strong>. So why did we define it? There are two advantages: avoid repetition in our grammar and simplify managing the results of the parsing. We avoid repetition because if we did not have the element rule we should repeat\u00a0<em>(content|tag)<\/em> everywhere it is used. What if one day we add a new type of element? In addition to that it simplify the processing of the AST because it makes both the node represent tag and content extend a comment ancestor.<\/p>\n<h2>Advanced<\/h2>\n<p>In this section we deepen our understanding of ANTLR. We will look at more complex examples and situations we may have to handle in our parsing adventures. We will learn how to perform more adavanced testing, to catch more bugs and ensure a better quality for our code. We will see what a visitor is and how to use it. Finally, we will see how to deal with expressions and the complexity they bring.<\/p>\n<p>You can come back to this section when you need to deal with complex parsing problems.<\/p>\n<h2 id=\"setup-java-for-antlr\">23. The Markup Project in Java<\/h2>\n<p>You can follow the instructions in <a href=\"https:\/\/tomassetti.me\/antlr-mega-tutorial\/#java-setup\">Java Setup<\/a> or just copy the<code> antlr-java<\/code> folder of the companion repository. Once the file <code>pom.xml<\/code> is properly configured, this is how you build and execute the application.<\/p>\n<pre class=\"brush:java\">\/\/ use mwn to generate the package\r\nmvn package\r\n\/\/ every time you need to execute the application\r\njava -cp target\/markup-example-1.0-jar-with-dependencies.jar me.tomassetti.examples.MarkupParser.App<\/pre>\n<p>As you can see, it isn\u2019t any different from any typical Maven project, although it\u2019s indeed more complicated that a typical Javascript or Python project. Of course, if you use an IDE you don\u2019t need to do anything different from your typical workflow.<\/p>\n<h2 id=\"the-main-app.java\">24. The Main App.java<\/h2>\n<p>We are going to see how to write a typical ANTLR application in Java.<\/p>\n<pre class=\"brush:java\">package me.tomassetti.examples.MarkupParser;\r\nimport org.antlr.v4.runtime.*;\r\nimport org.antlr.v4.runtime.tree.*;\r\n \r\npublic class App \r\n{\r\n    public static void main( String[] args )\r\n    {\r\n        ANTLRInputStream inputStream = new ANTLRInputStream(\r\n            \"I would like to [b][i]emphasize[\/i][\/b] this and [u]underline [b]that[\/b][\/u] .\" +\r\n            \"Let's not forget to quote: [quote author=\\\"John\\\"]You're wrong![\/quote]\");\r\n        MarkupLexer markupLexer = new MarkupLexer(inputStream);\r\n        CommonTokenStream commonTokenStream = new CommonTokenStream(markupLexer);\r\n        MarkupParser markupParser = new MarkupParser(commonTokenStream);\r\n \r\n        MarkupParser.FileContext fileContext = markupParser.file();                \r\n        MarkupVisitor visitor = new MarkupVisitor();                \r\n        visitor.visit(fileContext);        \r\n    }\r\n}<\/pre>\n<p>At this point the main java file should not come as a surprise, the only new development is the visitor. Of course, there are the obvious little differences in the names of the ANTLR classes and such. This time we are building a visitor, whose main advantage is the chance to control the flow of the program. While we are still dealing with text, we don\u2019t want to display it, we want to transform it from pseudo-BBCode to pseudo-Markdown.<\/p>\n<h2 id=\"transforming-code-with-antlr\">25. Transforming Code with ANTLR<\/h2>\n<p>The first issue to deal with our translation from\u00a0pseudo-BBCode to pseudo-Markdown is a design decision. Our two languages are different and frankly neither of the two original one is that well designed.<\/p>\n<p>BBCode was created as a safety precaution, to make possible to disallow the use of HTML but giove\u00a0some of its power to users. Markdown was created to be an easy to read and write format, that could be translated into HTML. So they both mimic HTML, and you can actually use HTML in a Markdown document. Let\u2019s start to look into how messy would be a real conversion.<\/p>\n<pre class=\"brush:java\">package me.tomassetti.examples.MarkupParser;\r\n \r\nimport org.antlr.v4.runtime.*;\r\nimport org.antlr.v4.runtime.misc.*;\r\nimport org.antlr.v4.runtime.tree.*;\r\n \r\npublic class MarkupVisitor extends MarkupParserBaseVisitor\r\n{\r\n    @Override\r\n    public String visitFile(MarkupParser.FileContext context)\r\n    {\r\n         visitChildren(context);\r\n         \r\n         System.out.println(\"\");\r\n         \r\n         return null;\r\n    }\r\n    \r\n    @Override\r\n    public String visitContent(MarkupParser.ContentContext context)\r\n    {\r\n        System.out.print(context.TEXT().getText());\r\n        \r\n        return visitChildren(context);\r\n    }\r\n}<\/pre>\n<p>The first version of our visitor prints all the text and ignore all the tags.<\/p>\n<p>You can see how to control the flow, either\u00a0by calling <code>visitChildren<\/code>, or any other visit* function, and deciding what to return. We just need to override the methods\u00a0that we want to change. Otherwise, the default implementation would just do like <code>visitContent<\/code>, on line 23, it will visit the children nodes and allows the visitor to continue. Just like for a listener, the argument is the proper context type. If you want to stop the visitor just return null as on line 15.<\/p>\n<h2 id=\"joy-and-pain-of-transforming-code\">26. Joy and Pain of Transforming Code<\/h2>\n<p>Transforming code, even at a very simple level, comes with some complications. Let\u2019s start easy with some basic visitor methods.<\/p>\n<pre class=\"brush:java\">@Override\r\npublic String visitContent(MarkupParser.ContentContext context)    \r\n{          \r\n    return context.getText();        \r\n}    \r\n \r\n@Override\r\npublic String visitElement(MarkupParser.ElementContext context)\r\n{\r\n    if(context.parent instanceof MarkupParser.FileContext)\r\n    {\r\n        if(context.content() != null)            \r\n            System.out.print(visitContent(context.content()));            \r\n        if(context.tag() != null)\r\n            System.out.print(visitTag(context.tag()));\r\n    }    \r\n \r\n    return null;\r\n}<\/pre>\n<p>Before looking at the main method, let\u2019s look at the supporting ones. Foremost we have changed <code>visitContent<\/code> by making it return its text instead of printing it. Second, we have overridden the <code>visitElement<\/code> so that it prints the text of its child, but only if it\u2019s a top element, and not inside a <strong>tag<\/strong>. In both cases, it achieve this by calling the proper visit* method. It knows which one to call because it checks if it actually has a <strong>tag<\/strong> or <strong>content<\/strong> node.<\/p>\n<pre class=\"brush:java\">@Override\r\npublic String visitTag(MarkupParser.TagContext context)    \r\n{\r\n    String text = \"\";\r\n    String startDelimiter = \"\", endDelimiter = \"\";\r\n \r\n    String id = context.ID(0).getText();\r\n    \r\n    switch(id)\r\n    {\r\n        case \"b\":\r\n            startDelimiter = endDelimiter = \"**\";                \r\n        break;\r\n        case \"u\":\r\n            startDelimiter = endDelimiter = \"*\";                \r\n        break;\r\n        case \"quote\":\r\n            String attribute = context.attribute().STRING().getText();\r\n            attribute = attribute.substring(1,attribute.length()-1);\r\n            startDelimiter = System.lineSeparator() + \"&gt; \";\r\n            endDelimiter = System.lineSeparator() + \"&gt; \" + System.lineSeparator() + \"&gt; \u2013 \"\r\n                         + attribute + System.lineSeparator();\r\n        break;\r\n    } \r\n \r\n    text += startDelimiter;\r\n \r\n    for (MarkupParser.ElementContext node: context.element())\r\n    {                \r\n        if(node.tag() != null)\r\n            text += visitTag(node.tag());\r\n        if(node.content() != null)\r\n            text += visitContent(node.content());                \r\n    }        \r\n    \r\n    text += endDelimiter;\r\n    \r\n    return text;        \r\n}<\/pre>\n<p><code>VisitTag<\/code> contains more code than every other method, because it can also contain other elements, including other tags that have to be managed themselves, and thus they cannot be simply printed. We save the content of the <strong>ID <\/strong>on line 5, of course we don\u2019t need to check that the corresponding end tag matches, because the parser will ensure that, as long as the input is well formed.<\/p>\n<p>The first complication starts with at lines 14-15: as it often happens when transforming a language in a different one, there isn\u2019t a perfect correspondence between the two. While BBCode tries to be a smarter and safer replacement for HTML, Markdown want to accomplish the same objective of HTML, to create a structured document. So BBCode has an underline tag, while Markdown does not.<\/p>\n<blockquote>\n<p>So we have to make a decision<\/p>\n<\/blockquote>\n<p>Do we want to discard the information, or directly print HTML, or something else? We choose something else and instead convert the underline to an italic. That might seem completely arbitrary, and indeed there is an element of choice in this decision. But the conversion forces us to lose some information, and both are used for emphasis, so we choose the closer thing in the new language.<\/p>\n<p>The following case, on lines 18-22, force us to make another choice. We can\u2019t maintain the information about the author of the quote in a structured way, so we choose to print the information in a way that will make sense to a human reader.<\/p>\n<p>On lines 28-34 we do our \u201cmagic\u201d: we visit the children and gather their text, then we close with the <strong>endDelimiter<\/strong>. Finally we return the text that we have created.<\/p>\n<blockquote>\n<p>That\u2019s how the visitor works<\/p>\n<\/blockquote>\n<ol>\n<li>every top<strong> element <\/strong>visit each child\n<ul>\n<li>if it\u2019s a <strong>content<\/strong> node,\u00a0it directly returns the text<\/li>\n<li>if it\u2019s a <strong>tag<\/strong>,\u00a0it setups the correct delimiters and then\u00a0it checks its children.\u00a0It repeats step 2 for each children and then it returns the gathered text<\/li>\n<\/ul>\n<\/li>\n<li>it\u00a0prints the returned text<\/li>\n<\/ol>\n<p>It\u2019s obviously a simple example, but it show how you can have\u00a0great freedom in managing the visitor once you have launched it. Together with the patterns that we have seen at the beginning of this section you can see all of the options: to return null to stop the visit, to return children to continue, to return something to perform an action ordered\u00a0at an higher level of the tree.<\/p>\n<h2 id=\"advanced-testing\">27. Advanced Testing<\/h2>\n<p>The use of lexical modes permit to handle the parsing of island languages, but it complicates testing.<\/p>\n<p>We are not going to show <code>MarkupErrorListener.java<\/code> because w edid not changed it; if you need you can see it on the repository.<\/p>\n<p>You can run the tests by using the following command.<\/p>\n<pre class=\"brush:java\">mvn test<\/pre>\n<p>Now we are going to look at the tests code. We are skipping the setup part, because that also is obvious, we just copy the process seen on the main file, but we simply add our error listener to intercept the errors.<\/p>\n<pre class=\"brush:java\">\/\/ private variables inside the class AppTest\r\nprivate MarkupErrorListener errorListener;\r\nprivate MarkupLexer markupLexer;\r\n \r\npublic void testText()\r\n{\r\n    MarkupParser parser = setup(\"anything in here\");\r\n \r\n    MarkupParser.ContentContext context = parser.content();        \r\n    \r\n    assertEquals(\"\",this.errorListener.getSymbol());\r\n}\r\n \r\npublic void testInvalidText()\r\n{\r\n    MarkupParser parser = setup(\"[anything in here\");\r\n \r\n    MarkupParser.ContentContext context = parser.content();        \r\n    \r\n    assertEquals(\"[\",this.errorListener.getSymbol());\r\n}\r\n \r\npublic void testWrongMode()\r\n{\r\n    MarkupParser parser = setup(\"author=\\\"john\\\"\");                \r\n \r\n    MarkupParser.AttributeContext context = parser.attribute(); \r\n    TokenStream ts = parser.getTokenStream();        \r\n    \r\n    assertEquals(MarkupLexer.DEFAULT_MODE, markupLexer._mode);\r\n    assertEquals(MarkupLexer.TEXT,ts.get(0).getType());        \r\n    assertEquals(\"author=\\\"john\\\"\",this.errorListener.getSymbol());\r\n}\r\n \r\npublic void testAttribute()\r\n{\r\n    MarkupParser parser = setup(\"author=\\\"john\\\"\");\r\n    \/\/ we have to manually push the correct mode\r\n    this.markupLexer.pushMode(MarkupLexer.BBCODE);\r\n \r\n    MarkupParser.AttributeContext context = parser.attribute(); \r\n    TokenStream ts = parser.getTokenStream();        \r\n    \r\n    assertEquals(MarkupLexer.ID,ts.get(0).getType());\r\n    assertEquals(MarkupLexer.EQUALS,ts.get(1).getType());\r\n    assertEquals(MarkupLexer.STRING,ts.get(2).getType()); \r\n    \r\n    assertEquals(\"\",this.errorListener.getSymbol());\r\n}\r\n \r\npublic void testInvalidAttribute()\r\n{\r\n    MarkupParser parser = setup(\"author=\/\\\"john\\\"\");\r\n    \/\/ we have to manually push the correct mode\r\n    this.markupLexer.pushMode(MarkupLexer.BBCODE);\r\n    \r\n    MarkupParser.AttributeContext context = parser.attribute();        \r\n    \r\n    assertEquals(\"\/\",this.errorListener.getSymbol());\r\n}<\/pre>\n<p>The first two methods are exactly as before, we simply check that there are no errors, or that there is the correct one because the input itself is erroneous. On lines 30-32 things start to get interesting: the\u00a0issue is that by testing the rules one by one we don\u2019t give the chance to the parser to switch automatically to the correct mode. So it remains always on the DEFAULT_MODE, which in our case makes everything looks like <strong>TEXT<\/strong>. This obviously makes the correct parsing of an <strong>attribute<\/strong> impossible.<\/p>\n<p>The same lines shows also how you can check the current mode that you are in, and the exact type of the tokens that are found by the parser, which we use to confirm that indeed all is wrong in this case.<\/p>\n<p>While we could use a string of text to trigger the correct mode, each time, that would make testing intertwined with several pieces of code, which is a no-no. So the solution is seen on line 39: we trigger the correct\u00a0mode\u00a0manually. Once you have done that, you can see that our attribute is recognized correctly.<\/p>\n<h2 id=\"dealing-with-expressions\">28. Dealing with Expressions<\/h2>\n<p>So far\u00a0we have written\u00a0simple parser rules, now we are going to see one of the most challenging parts in analyzing a real (programming) language: expressions. While rules for statements are usually larger they are quite simple to deal with: you just need to write a rule that encapsulate the structure with the all the different optional parts. For instance a <code>for<\/code> statement can include all other kind of statements, but we can simply include them with something like <code>statement*. <\/code>An expression, instead, can be combined in many different ways.<\/p>\n<p>An expression usually contains other expressions. For example the typical binary\u00a0expression is composed\u00a0by an expression on the left, an operator in the middle and another expression on the right. This can lead to ambiguities. Think, for example, at the expression <code>5 + 3 * 2<\/code>, for ANTLR this expression is ambiguous because there are two ways to parse it. It could either parse it as 5 + (3 * 2) or (5 +3) * 2.<\/p>\n<p>Until this moment we have avoided the problem simply because markup constructs surround the object on which they are applied. So there is not ambiguity in choosing which one to apply first: it\u2019s the most external. Imagine if this expression was written as:<\/p>\n<pre class=\"brush:xml\">&lt;add&gt;\r\n    &lt;int&gt;5&lt;\/int&gt;\r\n    &lt;mul&gt;\r\n        &lt;int&gt;3&lt;\/int&gt;\r\n        &lt;int&gt;2&lt;\/int&gt;\r\n    &lt;\/mul&gt;\r\n&lt;\/add&gt;<\/pre>\n<p>That would make obvious to ANTLR how to parse it.<\/p>\n<p>These types of rules are called <em>left-recursive rules. <\/em>You might say: just parse whatever comes first. The problem with that is semantic: the addition comes first, but we know that multiplications have a precedence over additions. Traditionally the way to solve this problem was to create a complex cascade of specific expressions like this:<\/p>\n<pre class=\"brush:bash\">expression     : addition;\r\naddition       : multiplication ('+' multiplication)* ;\r\nmultiplication : atom ('*' atom)* ;\r\natom           : NUMBER ;<\/pre>\n<p>This way ANTLR would have known to search first for a number, then for multiplications and finally for additions. This is cumbersome and also counterintuitive, because the last expression is the\u00a0first to be actually recognized. Luckily <strong>ANTLR4 can create a similar structure automatically, so\u00a0we can use\u00a0a much more natural syntax<\/strong>.<\/p>\n<pre class=\"brush:bash\">expression : expression '*' expression\r\n           | expression '+' expression                      \r\n           | NUMBER\r\n           ;<\/pre>\n<p>In practice ANTLR consider the order in which we defined the alternatives to decide the precedence. By writing the rule in this way we are telling to ANTLR that the multiplication has precedence on the addition.<\/p>\n<h2 id=\"parsing-spreadsheets\">29. Parsing Spreadsheets<\/h2>\n<p>Now we are prepared to create our last application, in C#. We are going to build \u00a0the parser of an Excel-like application. In practice, we want to manage the expressions you write in the cells of a spreadsheet.<\/p>\n<pre class=\"brush:bash\">grammar Spreadsheet;\r\n \r\nexpression          : '(' expression ')'                        #parenthesisExp\r\n                    | expression (ASTERISK|SLASH) expression    #mulDivExp\r\n                    | expression (PLUS|MINUS) expression        #addSubExp\r\n                    | &lt;assoc=right&gt;  expression '^' expression  #powerExp\r\n                    | NAME '(' expression ')'                   #functionExp\r\n                    | NUMBER                                    #numericAtomExp\r\n                    | ID                                        #idAtomExp\r\n                    ;\r\n \r\nfragment LETTER     : [a-zA-Z] ;\r\nfragment DIGIT      : [0-9] ;\r\n \r\nASTERISK            : '*' ;\r\nSLASH               : '\/' ;\r\nPLUS                : '+' ;\r\nMINUS               : '-' ;\r\n \r\nID                  : LETTER DIGIT ;\r\n \r\nNAME                : LETTER+ ;\r\n \r\nNUMBER              : DIGIT+ ('.' DIGIT+)? ;\r\n \r\nWHITESPACE          : ' ' -&gt; skip;<\/pre>\n<p>With all the knowledge you have acquired so far everything should be clear, except for possibly three things:<\/p>\n<ol>\n<li>why the parentheses are there,<\/li>\n<li>what\u2019s the stuff on the right,<\/li>\n<li>that thing on line 6.<\/li>\n<\/ol>\n<p>The parentheses comes first because its only role is to give the user a way to override the precedence of operator, if it needs to do so. This graphical representation of the AST should make it clear.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_3.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64428\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_3.png\" alt=\"\" width=\"812\" height=\"275\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_3.png 812w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_3-300x102.png 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_3-768x260.png 768w\" sizes=\"(max-width: 812px) 100vw, 812px\" \/><\/a><\/p>\n<p>The things on the right are <em>labels<\/em>, they are used to make ANTLR generate specific functions for the visitor or listener. So there will be a <code>VisitFunctionExp<\/code>, a <code>VisitPowerExp<\/code>, etc. This makes\u00a0possible to avoid the use of a giant visitor for the <strong>expression<\/strong> rule.<\/p>\n<p>The expression relative to exponentiation is different because there are two possible ways to act, to group them, when you meet two sequential expressions of the same type. The first one is to execute the one on the left first and then the one on the right, the second one is the inverse: this is called <em>associativity<\/em>. Usually the one that you want to use is <em>left-associativity,\u00a0<\/em> which is the default option. Nonetheless exponentiation is <em>right-associative<\/em>, so we have to signal this to ANTLR.<\/p>\n<p><strong>Another way to look at this is: if there are two expressions of the same type, which one has the precedence: the left one or the right one?<\/strong> Again, an image is worth a thousand words.<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_4.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-64429\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_4.png\" alt=\"\" width=\"565\" height=\"240\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_4.png 565w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2017\/03\/antlr4_parse_tree_4-300x127.png 300w\" sizes=\"(max-width: 565px) 100vw, 565px\" \/><\/a><\/p>\n<p>We have also support for functions, alphanumeric variables that represents cells and real numbers.<\/p>\n<h2 id=\"setup-csharp\">30. The Spreadsheet Project in C#<\/h2>\n<p>You just need to follow the <a href=\"https:\/\/tomassetti.me\/antlr-mega-tutorial\/#csharp-setup\">C# Setup<\/a>: to install a nuget package for the runtime and an ANTLR4 extension for Visual Studio. The extension will automatically generate everything whenever you build your project: parser, listener and\/or visitor.<\/p>\n<p>After you have done that, you can also add grammar files just by using the usual menu Add -&gt; New Item. Do exactly that to create a grammar called <code>Spreadsheet.g4<\/code> and put in it the grammar we have just created. Now let\u2019s see the main <code>Program.cs<\/code>.<\/p>\n<pre class=\"brush:java\">using System;\r\nusing Antlr4.Runtime;\r\n \r\nnamespace AntlrTutorial\r\n{\r\n    class Program\r\n    {\r\n        static void Main(string[] args)\r\n        {\r\n            string input = \"log(10 + A1 * 35 + (5.4 - 7.4))\";\r\n \r\n            AntlrInputStream inputStream = new AntlrInputStream(input);\r\n            SpreadsheetLexer spreadsheetLexer = new SpreadsheetLexer(inputStream);\r\n            CommonTokenStream commonTokenStream = new CommonTokenStream(spreadsheetLexer);\r\n            SpreadsheetParser spreadsheetParser = new SpreadsheetParser(commonTokenStream);\r\n \r\n            SpreadsheetParser.ExpressionContext expressionContext = spreadsheetParser.expression();\r\n            SpreadsheetVisitor visitor = new SpreadsheetVisitor();\r\n            \r\n            Console.WriteLine(visitor.Visit(expressionContext));\r\n        }\r\n    }\r\n}<\/pre>\n<p>There is nothing to say, apart from that, of course, you have to pay attention to yet another slight variation in the naming of things: pay attention to the casing. For instance, <code>AntlrInputStream<\/code>, in the C# program, was <code>ANTLRInputStream<\/code> in the Java program.<\/p>\n<p>Also you can notice that, this time, we output on the screen the result of our visitor, instead of writing the result on a file.<\/p>\n<h2 id=\"excel-is-doomed\">31. Excel is Doomed<\/h2>\n<p>We are going to take a look at our visitor for the <em>Spreadsheet<\/em> project.<\/p>\n<pre class=\"brush:java\">public class SpreadsheetVisitor : SpreadsheetBaseVisitor&lt;double&gt;\r\n{\r\n    private static DataRepository data = new DataRepository();\r\n \r\n    public override double VisitNumericAtomExp(SpreadsheetParser.NumericAtomExpContext context)\r\n    {            \r\n        return double.Parse(context.NUMBER().GetText(), System.Globalization.CultureInfo.InvariantCulture);\r\n    }\r\n \r\n    public override double VisitIdAtomExp(SpreadsheetParser.IdAtomExpContext context)\r\n    {\r\n        String id = context.ID().GetText();\r\n \r\n        return data[id];\r\n    }\r\n \r\n    public override double VisitParenthesisExp(SpreadsheetParser.ParenthesisExpContext context)\r\n    {\r\n        return Visit(context.expression());\r\n    }\r\n \r\n    public override double VisitMulDivExp(SpreadsheetParser.MulDivExpContext context)\r\n    {\r\n        double left = Visit(context.expression(0));\r\n        double right = Visit(context.expression(1));\r\n        double result = 0;\r\n \r\n        if (context.ASTERISK() != null)\r\n            result = left * right;\r\n        if (context.SLASH() != null)\r\n            result = left \/ right;\r\n \r\n        return result;\r\n    }\r\n \r\n    [..]\r\n \r\n    public override double VisitFunctionExp(SpreadsheetParser.FunctionExpContext context)\r\n    {\r\n        String name = context.NAME().GetText();\r\n        double result = 0;\r\n \r\n        switch(name)\r\n        {\r\n            case \"sqrt\":\r\n                result = Math.Sqrt(Visit(context.expression()));\r\n                break;\r\n \r\n            case \"log\":\r\n                result = Math.Log10(Visit(context.expression()));\r\n                break;\r\n        }\r\n \r\n        return result;\r\n    }\r\n}<\/pre>\n<p><code>VisitNumeric<\/code> and <code>VisitIdAtom<\/code> return the actual numbers that are represented either by the literal number or the variable. In a real scenario <strong>DataRepository<\/strong> would contain methods to access the data in the proper cell, but in our example is just a Dictionary with some keys and numbers. The other methods actually work in the same way: they visit\/call the containing expression(s). The only difference is what they do with the results.<\/p>\n<p>Some perform an operation on the result, the binary operations combine two results in the proper way and finally <code>VisitParenthesisExp<\/code> just reports the result higher on the chain. Math is simple, when it\u2019s done by a computer.<\/p>\n<h2 id=\"testing-everything\">32. Testing Everything<\/h2>\n<p>Up until now we have only tested the parser rules, that is to say we have tested only if we have created the correct rule to parse our input. Now we are also\u00a0going to test the visitor functions. This is the ideal\u00a0chance because\u00a0our visitor return values that we can check individually. In other occasions, for instance if your visitor prints something to the screen,\u00a0you may want to rewrite the visitor to write on a stream. Then, at testing time, you can easily capture the output.<\/p>\n<p>We are not going to show <code>SpreadsheetErrorListener.cs<\/code> because it\u2019s the same as the previous one we have already seen; if you need it you can see it on the repository.<\/p>\n<p>To perform unit testing on Visual Studio you need to create a specific project inside the solution. You can choose different formats, we opt for the xUnit version. To run them there is an aptly named section \u201cTEST\u201d on the menu bar.<\/p>\n<pre class=\"brush:java\">[Fact]\r\npublic void testExpressionPow()\r\n{\r\n    setup(\"5^3^2\");\r\n \r\n    PowerExpContext context = parser.expression() as PowerExpContext;\r\n \r\n    CommonTokenStream ts = (CommonTokenStream)parser.InputStream;   \r\n \r\n    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(0).Type);\r\n    Assert.Equal(SpreadsheetLexer.T__2, ts.Get(1).Type);\r\n    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(2).Type);\r\n    Assert.Equal(SpreadsheetLexer.T__2, ts.Get(3).Type);\r\n    Assert.Equal(SpreadsheetLexer.NUMBER, ts.Get(4).Type); \r\n}\r\n \r\n[Fact]\r\npublic void testVisitPowerExp()\r\n{\r\n    setup(\"4^3^2\");\r\n \r\n    PowerExpContext context = parser.expression() as PowerExpContext;\r\n \r\n    SpreadsheetVisitor visitor = new SpreadsheetVisitor();\r\n    double result = visitor.VisitPowerExp(context);\r\n \r\n    Assert.Equal(double.Parse(\"262144\"), result);\r\n}\r\n \r\n[..]\r\n \r\n[Fact]\r\npublic void testWrongVisitFunctionExp()\r\n{\r\n    setup(\"logga(100)\");\r\n \r\n    FunctionExpContext context = parser.expression() as FunctionExpContext;\r\n    \r\n    SpreadsheetVisitor visitor = new SpreadsheetVisitor();\r\n    double result = visitor.VisitFunctionExp(context);\r\n \r\n    CommonTokenStream ts = (CommonTokenStream)parser.InputStream;\r\n \r\n    Assert.Equal(SpreadsheetLexer.NAME, ts.Get(0).Type);\r\n    Assert.Equal(null, errorListener.Symbol);\r\n    Assert.Equal(0, result);\r\n}\r\n \r\n[Fact]\r\npublic void testCompleteExp()\r\n{\r\n    setup(\"log(5+6*7\/8)\");\r\n \r\n    ExpressionContext context = parser.expression();\r\n \r\n    SpreadsheetVisitor visitor = new SpreadsheetVisitor();\r\n    double result = visitor.Visit(context);\r\n \r\n    Assert.Equal(\"1.01072386539177\", result.ToString(System.Globalization.CultureInfo.GetCultureInfo(\"en-US\").NumberFormat));            \r\n}<\/pre>\n<p>The first test function is similar to the ones we have already seen; it checks that the corrects tokens are selected. On line 11 and 13 you may be surprised to see that weird token type, this happens because we didn\u2019t explicitly created one for the \u2018^\u2019 symbol so one got automatically created for us. If you need you can see all the tokens by looking at the *.tokens file generated by ANTLR.<\/p>\n<p>On line 25 we\u00a0visit our test node and get the results, that we check on line 27. It\u2019s all very simple because our visitor is simple, while unit testing should always be easy and made up of small parts it really can\u2019t be easier than this.<\/p>\n<p>The only thing to pay attention to it\u2019s related to the\u00a0format of the number, it\u2019s not a problem here, but\u00a0look at line 59,\u00a0where we test the result of a whole expression. There we need to make sure that the correct format is selected, because\u00a0different countries use different symbols as\u00a0the decimal mark.<\/p>\n<blockquote>\n<p>There are some things that depends on the cultural context<\/p>\n<\/blockquote>\n<p>If your computer\u00a0was already\u00a0set to the\u00a0<em>American English\u00a0Culture<\/em> this wouldn\u2019t be necessary, but to\u00a0guarantee the correct\u00a0testing results for everybody we have to specify it. Keep that in mind if you are testing things that are culture-dependent: such as grouping of digits, temperatures, etc.<\/p>\n<p>On line 44-46 you see than when we check for the wrong function the parser actually works. That\u2019s because indeed \u201clogga\u201d is syntactically valid as a function name, but it\u2019s not semantically correct. The function \u201clogga\u201d doesn\u2019t exists, so our program doesn\u2019t know what to do with it. So when we visit it we get\u00a00 as a result. As you recall this was our choice: since we initialize the result to 0 and we don\u2019t have a <code>default<\/code> case in <code>VisitFunctionExp. <\/code>So if there no function the result remains 0. A possib alternative could be to throw an exception.<\/p>\n<h2>Final Remarks<\/h2>\n<p>In this section we see tips and tricks that never came up in our example, but can be useful in your programs. We suggest more resources you may find useful if you want to know more about ANTLR, both the practice and the theory, or you need to deal with the most complex problems.<\/p>\n<h2 id=\"tips-and-tricks\">33. Tips and Tricks<\/h2>\n<p>Let\u2019s see a few tricks that could be useful from time to time. These were never needed in our examples, but they have been quite useful in other scenarios.<\/p>\n<h3>Catchall\u00a0Rule<\/h3>\n<p>The first one is the\u00a0<strong>ANY<\/strong> lexer rule. This is simply a rule in the following format.<\/p>\n<pre class=\"brush:bash\">\t\r\nANY : . ;<\/pre>\n<p>This is a catchall rule that should be put at the end of your grammar. It matches any character that didn\u2019t find its place during the parsing. So creating this rule can help you\u00a0during development, when your grammar has still many holes that could cause distracting error messages. It\u2019s even useful during production, when it acts as a canary in the mines. If it shows up in your program you know that something is wrong.<\/p>\n<h3>Channels<\/h3>\n<p>There is also something that we haven\u2019t talked about: <em>channels<\/em>. Their use case is usually handling comments. You don\u2019t really want to check for comments inside every of your statements or expressions, so you usually throw them way with <code>-&gt; skip<\/code>. But there are some cases where you may want to preserve them, for instance if you are translating a program in another language. When this happens you use <em>channels<\/em>. There is already one called HIDDEN that you can use, but you can declare more of them at the top of your lexer grammar.<\/p>\n<pre class=\"brush:bash\">channels { UNIQUENAME }\r\n\/\/ and you use them this way\r\nCOMMENTS : '\/\/' ~[\\r\\n]+ -&gt; channel(UNIQUENAME) ;<\/pre>\n<h3>Rule Element Labels<\/h3>\n<p>There is another use of labels other than to distinguish among different cases of the same rule. They can be used to give a specific name, usually but not always of semantic value, to a common rule or parts of a rule. The format is <code>label=rule<\/code>, to be used inside another rule.<\/p>\n<pre class=\"brush:bash\">expression : left=expression (ASTERISK|SLASH) right=expression ;<\/pre>\n<p>This way <strong>left<\/strong> and <strong>right<\/strong> would become fields in the <code>ExpressionContext<\/code> nodes. And instead of using <code>context.expression(0)<\/code>, you could refer to the same entity using <code>context.left<\/code>.<\/p>\n<h3>Problematic\u00a0Tokens<\/h3>\n<p>In\u00a0many real\u00a0languages some symbols are reused in different ways, some of which may lead to ambiguities.\u00a0A common problematic example\u00a0are the angle brackets, used both for bitshift expression and to delimit parameterized types.<\/p>\n<pre class=\"brush:bash\">\/\/ bitshift expression, it assigns to x the value of y shifted by three bits\r\nx = y &gt;&gt; 3;\r\n\/\/ parameterized types, it define x as a list of dictionaries\r\nList&lt;Dictionary&lt;string, int&gt;&gt; x;<\/pre>\n<p>The natural way of defining the bitshift operator token is as a single double angle brackets, \u2018&gt;&gt;\u2019. But this might lead to confusing a nested parameterized definition with the bitshift operator, for instance in the second example shown up here. While a simple way of solving the problem would be using semantic predicates, an excessive number of them would slow down the parsing phase. The solution is to avoid defining the bitshift operator token and instead using the angle brackets twice in the parser rule, so that the parser itself can choose the best candidate for every occasion.<\/p>\n<pre class=\"brush:bash\">\/\/ from this\r\nRIGHT_SHIFT : '&gt;&gt;';\r\nexpression : ID RIGHT_SHIFT NUMBER;\r\n\/\/ to this\r\nexpression : ID SHIFT SHIFT NUMBER;<\/pre>\n<h2>34. Conclusions<\/h2>\n<p>We have learned a lot today:<\/p>\n<ul>\n<li>what are a lexer and a parser<\/li>\n<li>how to create lexer and parser rules<\/li>\n<li>how to use ANTLR to generate parsers in Java, C#, Python and JavaScript<\/li>\n<li>the fundamental kinds of problems you will encounter parsing and how to solve them<\/li>\n<li>how to understand errors<\/li>\n<li>how to test your parsers<\/li>\n<\/ul>\n<p>That\u2019s all you need to know to use ANTLR on your own. And I mean literally, you may want to know more, but now you have solid\u00a0basis to explore on your own.<\/p>\n<p>Where to look if you need more information about ANTLR:<\/p>\n<ul>\n<li>On this very website there is <a href=\"https:\/\/tomassetti.me\/category\/language-engineering\/antlr\/\">whole category dedicated to ANTLR<\/a>.<\/li>\n<li>The <a href=\"http:\/\/www.antlr.org\/\">official ANTLR website<\/a>\u00a0is a good starting point to know the general status of the project, the specialized development tools and related project, like StringTemplate<\/li>\n<li>The <a href=\"https:\/\/github.com\/antlr\/antlr4\/tree\/master\/doc\">ANTLR documentation on GitHub<\/a>; especially useful are the information on\u00a0<a href=\"https:\/\/github.com\/antlr\/antlr4\/blob\/master\/doc\/targets.md\">targets and how to setup it on different languages<\/a>.<\/li>\n<li>The <a href=\"http:\/\/www.antlr.org\/api\/Java\/index.html\">ANTLR 4.6 API<\/a>; it\u2019s related to the Java version, so there might be some differences in other languages, but it\u2019s the best place where to settle your doubts about the inner workings of this tool.<\/li>\n<li>For the very interested in the\u00a0science behind ANTLR4, there is an academic paper:\u00a0<a href=\"http:\/\/www.antlr.org\/papers\/allstar-techreport.pdf\"><em>Adaptive LL(*) Parsing: The Power of Dynamic Analysis<\/em><\/a><\/li>\n<li><a href=\"https:\/\/pragprog.com\/book\/tpantlr2\/the-definitive-antlr-4-reference\"><strong>The Definitive ANTLR 4 Reference<\/strong><\/a>, by the man itself, <em>Terence Parr<\/em>, the creator of ANTLR. The resource you need if you want to know everything about ANTLR and a good deal about parsing languages in general.<\/li>\n<\/ul>\n<p>Also the book it\u2019s only place where you can find and answer to question like these:<\/p>\n<blockquote>\n<p>ANTLR v4 is the result of a minor detour (twenty-five years) I took in graduate<br \/>\nschool. I guess I\u2019m going to have to change my motto slightly.<\/p>\n<p><em>Why program by hand in five days what you can spend twenty-five years of your<\/em><br \/>\n<em>life automating?<\/em><\/p>\n<\/blockquote>\n<p><em>We worked quite hard to build the largest tutorial on ANTLR: the mega-tutorial! A post over 13.000 words long, or more than 30 pages, to try answering all your questions about ANTLR. Missing something? <a href=\"https:\/\/tomassetti.me\/contact-me-about-a-project\/\">Contact us<\/a> and let us now, we are here to help<\/em><\/p>\n<div class=\"attribution\">\n<table>\n<tbody>\n<tr>\n<td><span class=\"reference\">Reference: <\/span><\/td>\n<td><a href=\"https:\/\/tomassetti.me\/antlr-mega-tutorial\/\">The ANTLR mega tutorial<\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/join-us\/jcg\/\">JCG partner<\/a> Federico Tomassetti at the <a href=\"http:\/\/tomassetti.me\/\">Federico Tomassetti<\/a> blog.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are going to: explain the basis: what a parser is, what it can be used for see how to setup ANTLR to be used from Javascript, Python, Java and C# discuss &hellip;<\/p>\n","protected":false},"author":951,"featured_media":148,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[435],"class_list":["post-64404","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-java","tag-antlr"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>The ANTLR mega tutorial - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The ANTLR mega tutorial - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2017-03-09T14:00:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Federico Tomassetti\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@raindancer\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Federico Tomassetti\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"75 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html\"},\"author\":{\"name\":\"Federico Tomassetti\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/2da976480eeabb37d1f96edc20d63773\"},\"headline\":\"The ANTLR mega tutorial\",\"datePublished\":\"2017-03-09T14:00:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html\"},\"wordCount\":11478,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/java-logo.jpg\",\"keywords\":[\"ANTLR\"],\"articleSection\":[\"Core Java\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html\",\"name\":\"The ANTLR mega tutorial - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/java-logo.jpg\",\"datePublished\":\"2017-03-09T14:00:14+00:00\",\"description\":\"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/java-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/java-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2017\\\/03\\\/antlr-mega-tutorial.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Core Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\\\/core-java\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"The ANTLR mega tutorial\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/2da976480eeabb37d1f96edc20d63773\",\"name\":\"Federico Tomassetti\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g\",\"caption\":\"Federico Tomassetti\"},\"description\":\"Federico has a PhD in Polyglot Software Development. He is fascinated by all forms of software development with a focus on Model-Driven Development and Domain Specific Languages.\",\"sameAs\":[\"http:\\\/\\\/tomassetti.me\\\/\",\"https:\\\/\\\/fr.linkedin.com\\\/in\\\/federicotomassetti\",\"https:\\\/\\\/x.com\\\/raindancer\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/federico-tomassetti\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The ANTLR mega tutorial - Java Code Geeks","description":"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html","og_locale":"en_US","og_type":"article","og_title":"The ANTLR mega tutorial - Java Code Geeks","og_description":"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are","og_url":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2017-03-09T14:00:14+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg","type":"image\/jpeg"}],"author":"Federico Tomassetti","twitter_card":"summary_large_image","twitter_creator":"@raindancer","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Federico Tomassetti","Est. reading time":"75 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html"},"author":{"name":"Federico Tomassetti","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/2da976480eeabb37d1f96edc20d63773"},"headline":"The ANTLR mega tutorial","datePublished":"2017-03-09T14:00:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html"},"wordCount":11478,"commentCount":0,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg","keywords":["ANTLR"],"articleSection":["Core Java"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html","url":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html","name":"The ANTLR mega tutorial - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg","datePublished":"2017-03-09T14:00:14+00:00","description":"Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages. In this complete tutorial we are","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/java-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2017\/03\/antlr-mega-tutorial.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java","item":"https:\/\/www.javacodegeeks.com\/category\/java"},{"@type":"ListItem","position":3,"name":"Core Java","item":"https:\/\/www.javacodegeeks.com\/category\/java\/core-java"},{"@type":"ListItem","position":4,"name":"The ANTLR mega tutorial"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/2da976480eeabb37d1f96edc20d63773","name":"Federico Tomassetti","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/10d3414571edf95f2255d57c9c02759daba20499f6761de9228c1cbbbd2fab6c?s=96&d=mm&r=g","caption":"Federico Tomassetti"},"description":"Federico has a PhD in Polyglot Software Development. He is fascinated by all forms of software development with a focus on Model-Driven Development and Domain Specific Languages.","sameAs":["http:\/\/tomassetti.me\/","https:\/\/fr.linkedin.com\/in\/federicotomassetti","https:\/\/x.com\/raindancer"],"url":"https:\/\/www.javacodegeeks.com\/author\/federico-tomassetti"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/64404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/951"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=64404"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/64404\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/148"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=64404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=64404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=64404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}