{"id":649,"date":"2011-10-04T21:25:00","date_gmt":"2011-10-04T21:25:00","guid":{"rendered":"http:\/\/www.javacodegeeks.com\/2012\/10\/scala-tutorial-regular-expressions-matching.html"},"modified":"2012-10-21T20:32:43","modified_gmt":"2012-10-21T20:32:43","slug":"scala-tutorial-regular-expressions","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html","title":{"rendered":"Scala Tutorial &#8211; regular expressions, matching"},"content":{"rendered":"<div dir=\"ltr\" style=\"text-align: left\">\n<h2>         Preface<\/h2>\n<p>This is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on <a href=\"http:\/\/icl-f11.utcompling.com\/links\">the links page of the Computational Linguistics course<\/a> I\u2019m creating these for.&nbsp;Additionally you can find this and other tutorial series on the JCG&nbsp;<a href=\"http:\/\/www.javacodegeeks.com\/p\/java-tutorials.html\">Java Tutorials<\/a>&nbsp;page.<\/p>\n<p>This post is the first of two about regular expressions (regexes), which are essential for a wide range of programming tasks, and for computational linguistics tasks in particular. This tutorial explains how to use them with Scala, assuming that the reader is already familiar with regular expression syntax. It shows how to create regular expressions in Scala and use them with Scala powerful pattern matching capabilities, in particular for variable assignment and cases in match expressions.<\/p>\n<h2>         Creating regular expressions<\/h2>\n<p>Scala provides a very simple way to create regexes: just define a regex as a string and then call the <strong>r<\/strong> method on it. The following defines a regular expression that characterizes the string language <img decoding=\"async\" alt=\"a^mb^n\" class=\"latex\" src=\"http:\/\/s0.wp.com\/latex.php?latex=a%5Emb%5En&amp;bg=ffffff&amp;fg=9d9d9d&amp;s=0\" \/> (one or more <em>a<\/em>\u2018s followed by one or more <em>b\u2019<\/em>s, not necessarily the same as the number of <em>a<\/em>\u2018s).<\/p>\n<pre class=\"brush: scala;\">scala&gt; val AmBn = \"a+b+\".r\r\nAmBn: scala.util.matching.Regex = a+b+\r\n<\/pre>\n<p>To use meta-characters, like <em>\\s<\/em>, <em>\\w<\/em>, and <em>\\d<\/em>, you must either escape the slashes or use multiquoted strings, which are referred to as raw strings. The following are two equivalent ways to write a regex that covers strings of a sequence of word characters followed by a sequence of digits.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val WordDigit1 = \"\\\\w+\\\\d+\".r\r\nWordDigit1: scala.util.matching.Regex = \\w+\\d+\r\n \r\nscala&gt; val WordDigit2 = \"\"\"\\w+\\d+\"\"\".r\r\nWordDigit2: scala.util.matching.Regex = \\w+\\d+\r\n<\/pre>\n<p>Whether escaping or using raw strings is preferable depends on the context. For example, with the above, I\u2019d go with the raw string. However, for using a regex to split a string on whitespace characters, escaping is somewhat preferable.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val adder = \"We're as similar as two dissimilar things in a pod.\\n\\t-Blackadder\"\r\nadder: java.lang.String =\r\nWe're as similar as two dissimilar things in a pod.\r\n-Blackadder\r\n \r\nscala&gt; adder.split(\"\\\\s+\")\r\nres2: Array[java.lang.String] = Array(We're, as, similar, as, two, dissimilar, things, in, a, pod., -Blackadder)\r\n \r\nscala&gt; adder.split(\"\"\"\\s+\"\"\")\r\nres3: Array[java.lang.String] = Array(We're, as, similar, as, two, dissimilar, things, in, a, pod., -Blackadder)\r\n<\/pre>\n<p>A note on naming: the convention in Scala is to use variable names with the first letter uppercased for Regex objects. This makes them consistent with the use of pattern matching in match statements, as shown below.<\/p>\n<h2>         Matching with regexes<\/h2>\n<p>We saw above that using the <strong>r<\/strong> method on a String returns a value that is a Regex object (more on the <strong>scala.util.matching<\/strong> part below). How do you actually do useful things with these Regex objects? There are a number of ways. The prettiest, and perhaps most common for the non-computational linguist, is to use them in tandem with Scala\u2019s standard pattern matching capabilities. Let\u2019s consider the task of parsing names and turning them into useful data structures that we can do various useful things with.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Name = \"\"\"(Mr|Mrs|Ms)\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\"\"\".r\r\nName: scala.util.matching.Regex = (Mr|Mrs|Ms)\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\r\n \r\nscala&gt; val Name(title, first, last) = \"Mr. James Stevens\"\r\ntitle: String = Mr\r\nfirst: String = James\r\nlast: String = Stevens\r\n \r\nscala&gt; val Name(title, first, last) = \"Ms. Sally Kenton\"\r\ntitle: String = Ms\r\nfirst: String = Sally\r\nlast: String = Kenton\r\n<\/pre>\n<p>Notice the similarity with pattern matching on types like Array and List.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Array(title, first, last) = \"Mr. James Stevens\".split(\" \")\r\ntitle: java.lang.String = Mr.\r\nfirst: java.lang.String = James\r\nlast: java.lang.String = Stevens\r\n \r\nscala&gt; val List(title, first, last) = \"Mr. James Stevens\".split(\" \").toList\r\ntitle: java.lang.String = Mr.\r\nfirst: java.lang.String = James\r\nlast: java.lang.String = Stevens\r\n<\/pre>\n<p>Of course, notice that here the \u201c.\u201d was captured, while the regex excised it. A more substantive difference with the regular expression is that it only accepts strings with the right form and will reject others, unlike simple splitting and matching to Array.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Array(title, first, last) = \"221B Baker Street\".split(\" \")\r\ntitle: java.lang.String = 221B\r\nfirst: java.lang.String = Baker\r\nlast: java.lang.String = Street\r\n \r\nscala&gt; val Name(title, first, last) = \"221B Baker Street\"\r\nscala.MatchError: 221B Baker Street (of class java.lang.String)\r\nat .&lt;init&gt;(&lt;console&gt;:12)\r\nat .&lt;clinit&gt;(&lt;console&gt;)\r\nat .&lt;init&gt;(&lt;console&gt;:11)\r\nat .&lt;clinit&gt;(&lt;console&gt;)\r\nat $export(&lt;console&gt;)\r\nat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\r\nat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)\r\nat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)\r\nat java.lang.reflect.Method.invoke(Method.java:597)\r\nat scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:592)\r\nat scala.tools.nsc.interpreter.IMain$Request$$anonfun$10.apply(IMain.scala:828)\r\nat scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)\r\nat scala.tools.nsc.io.package$$anon$2.run(package.scala:31)\r\nat java.lang.Thread.run(Thread.java:680)\r\n<\/pre>\n<p>That\u2019s a lot of complaining, of course, but actually you would generally be either (a) absolutely sure that you have strings that are in the correct format or (b) you will be checking for such possible exceptions or (c) you\u2019ll be using the regex as one option of many in a match expression.<\/p>\n<p>For now, let\u2019s assume the input is appropriate. This means we can easily convert a list of names as strings into a list of tuples using map and a match expression.<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<pre class=\"brush: scala;\">scala&gt; val names = List(\"Mr. James Stevens\", \"Ms. Sally Kenton\", \"Mrs. Jane Doe\", \"Mr. John Doe\", \"Mr. James Smith\")\r\nnames: List[java.lang.String] = List(Mr. James Stevens, Ms. Sally Kenton, Mrs. Jane Doe, Mr. John Doe, Mr. James Smith)\r\n \r\nscala&gt; names.map(x =&gt; x match { case Name(title, first, last) =&gt; (title, first, last) })\r\nres11: List[(String, String, String)] = List((Mr,James,Stevens), (Ms,Sally,Kenton), (Mrs,Jane,Doe), (Mr,John,Doe), (Mr,James,Smith))\r\n<\/pre>\n<p>Note the crucial use of groups in the <em>Name<\/em> regex: the number of groups equal the number of variables being initialized in the match. The first group is needed for the alternatives <em>Mr, Mrs,<\/em> and <em>Ms<\/em>. Without the other groups, we get an error. (From here on, I\u2019ll shorten the MatchError output.)<\/p>\n<pre class=\"brush: scala;\">scala&gt; val NameOneGroup = \"\"\"(Mr|Mrs|Ms)\\. [A-Z][a-z]+ [A-Z][a-z]+\"\"\".r\r\nNameOneGroup: scala.util.matching.Regex = (Mr|Mrs|Ms)\\. [A-Z][a-z]+ [A-Z][a-z]+\r\n \r\nscala&gt; val NameOneGroup(title, first, last) = \"Mr. James Stevens\"\r\nscala.MatchError: Mr. James Stevens (of class java.lang.String)\r\n<\/pre>\n<p>Of course, we can still match to the first group.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val NameOneGroup(title) = \"Mr. James Stevens\"\r\ntitle: String = Mr\r\n<\/pre>\n<p>What if we go in the other direction, creating more groups so that we can, for example, share the \u201cM\u201d in the various titles? Here\u2019s an attempt.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val NameShareM = \"\"\"(M(r|rs|s))\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\"\"\".r\r\nNameShareM: scala.util.matching.Regex = (M(r|rs|s))\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\r\n \r\nscala&gt; val NameShareM(title, first, last) = \"Mr. James Stevens\"\r\nscala.MatchError: Mr. James Stevens (of class java.lang.String)\r\n<\/pre>\n<p>What happened is that a new group was created, so there are now four groups to match.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val NameShareM(title, titleEnding, first, last) = \"Mr. James Stevens\"\r\ntitle: String = Mr\r\ntitleEnding: String = r\r\nfirst: String = James\r\nlast: String = Stevens\r\n \r\nscala&gt; val NameShareM(title, titleEnding, first, last) = \"Mrs. Sally Kenton\"\r\ntitle: String = Mrs\r\ntitleEnding: String = rs\r\nfirst: String = Sally\r\nlast: String = Kenton\r\n<\/pre>\n<p>So, there is submatched group capturing. To stop the <strong>(r|rs|s)<\/strong> part from creating a match group while still being able to use it to group alternatives in a disjunction, use the <strong>?<\/strong>: operator.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val NameShareMThreeGroups = \"\"\"(M(?:r|rs|s))\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\"\"\".r\r\nNameShareMThreeGroups: scala.util.matching.Regex = (M(?:r|rs|s))\\. ([A-Z][a-z]+) ([A-Z][a-z]+)\r\n \r\nscala&gt; val NameShareMThreeGroups(title, first, last) = \"Mr. James Stevens\"\r\ntitle: String = Mr\r\nfirst: String = James\r\nlast: String = Stevens\r\n<\/pre>\n<p>By this point, sharing the <strong>M<\/strong> hasn\u2019t saved anything over <strong>(Mr|Mrs|Ms)<\/strong>, but there are plenty of situations where this is quite useful.<\/p>\n<p>We can also use regex backreferences. Say we want to match names like \u201c<em>Mr. John Bohn<\/em>\u201c, \u201c<em>Mr. Joe Doe<\/em>\u201c, and \u201c<em>Mrs. Jill Hill<\/em>\u201c.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val RhymeName = \"\"\"(Mr|Mrs|Ms)\\. ([A-Z])([a-z]+) ([A-Z])\\3\"\"\".r\r\nRhymeName: scala.util.matching.Regex = (Mr|Mrs|Ms)\\. ([A-Z])([a-z]+) ([A-Z])\\3\r\n \r\nscala&gt; val RhymeName(title, firstInitial, firstRest, lastInitial) = \"Mr. John Bohn\"\r\ntitle: String = Mr\r\nfirstInitial: String = J\r\nfirstRest: String = ohn\r\nlastInitial: String = B\r\n<\/pre>\n<p>Then we could piece things together to get the names we wanted.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val first = firstInitial+firstRest\r\nfirst: java.lang.String = John\r\n \r\nscala&gt; val last = lastInitial+firstRest\r\nlast: java.lang.String = Bohn\r\n<\/pre>\n<p>But we can do better by using an embedded group and just thowing its match result away with the underscore <strong>_<\/strong>.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val RhymeName2 = \"\"\"(Mr|Mrs|Ms)\\. ([A-Z]([a-z]+)) ([A-Z]\\3)\"\"\".r\r\nRhymeName2: scala.util.matching.Regex = (Mr|Mrs|Ms)\\. ([A-Z]([a-z]+)) ([A-Z]\\3)\r\n \r\nscala&gt; val RhymeName2(title, first, _, last) = \"Mr. John Bohn\"\r\ntitle: String = Mr\r\nfirst: String = John\r\nlast: String = Bohn\r\n<\/pre>\n<p><em>Note<\/em>: we can\u2019t use the <strong>?:<\/strong> operator with <strong>([a-z]+)<\/strong> to stop the match because we need exactly that string to match with the <strong>\\3<\/strong> later.<\/p>\n<p>Using regexes for assignment via pattern matching requires full string match.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Name(title, first, last) = \"Mr. James Stevens\"\r\ntitle: String = Mr\r\nfirst: String = James\r\nlast: String = Stevens\r\n \r\nscala&gt; val Name(title, first, last) = \"Mr. James Stevens walked to the door.\"\r\nscala.MatchError: Mr. James Stevens walked to the door. (of class java.lang.String)\r\n<\/pre>\n<p>This is a crucial aspect of using them in match expressions. Consider an application that needs to be able to parse telephone numbers in different formats, like (<em>123)555-5555<\/em> and <em>123-555-5555<\/em>. Here are regexes for these two patterns and their use to parse these numbers.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Phone1 = \"\"\"\\((\\d{3})\\)\\s*(\\d{3})-(\\d{4})\"\"\".r\r\nPhone1: scala.util.matching.Regex = \\((\\d{3})\\)\\s*(\\d{3})-(\\d{4})\r\n \r\nscala&gt; val Phone2 = \"\"\"(\\d{3})-(\\d{3})-(\\d{4})\"\"\".r\r\nPhone2: scala.util.matching.Regex = (\\d{3})-(\\d{3})-(\\d{4})\r\n \r\nscala&gt; val Phone1(area, first3, last4) = \"(123) 555-5555\"\r\narea: String = 123\r\nfirst3: String = 555\r\nlast4: String = 5555\r\n \r\nscala&gt; val Phone2(area, first3, last4) = \"123-555-5555\"\r\narea: String = 123\r\nfirst3: String = 555\r\nlast4: String = 5555\r\n<\/pre>\n<p>We could of course use a single regular expression, but we\u2019ll go with these two so that they can be used as separate case statements in a match expression that is part of a function that takes a string representation of a phone number and returns a tuple of three strings (thus normalizing the numbers).<\/p>\n<pre class=\"brush: scala;\">def normalizePhoneNumber (number: String) = number match {\r\n  case Phone1(x,y,z) =&gt; (x,y,z)\r\n  case Phone2(x,y,z) =&gt; (x,y,z)\r\n}\r\n<\/pre>\n<p>The action being taken for each match is just to package the separate values up in a <strong>Tuple3<\/strong> \u2014 more interesting things could be done if one were looking for country codes, dealing with multiple countries, etc. The point here is to see how the regular expressions are used for the cases to capture values and assign them to local variables, each time appropriate for the form of the string that is brought in. (We\u2019ll see in a later tutorial how to protect such a method from inputs that are not phone numbers and such.)<\/p>\n<p>Now that we have that function, we can easily apply it to a list of strings representing phone numbers and filter out just those in a specific area, for example.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val numbers = List(\"(123) 555-5555\", \"123-555-5555\", \"(321) 555-0000\")\r\nnumbers: List[java.lang.String] = List((123) 555-5555, 123-555-5555, (321) 555-0000)\r\n \r\nscala&gt; numbers.map(normalizePhoneNumber)\r\nres16: List[(String, String, String)] = List((123,555,5555), (123,555,5555), (321,555,0000))\r\n \r\nscala&gt; numbers.map(normalizePhoneNumber).filter(n =&gt; n._1==\"123\")\r\nres17: List[(String, String, String)] = List((123,555,5555), (123,555,5555))\r\n<\/pre>\n<h2>         Building Regexes from Strings<\/h2>\n<p>Sometimes one wants to build up a regex from smaller component parts, for example, defining what a noun phrase is and then searching for sequence of noun phrases. To do this, we first must see the longer form of creating a regex.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val AmBn = new scala.util.matching.Regex(\"a+b+\")\r\nAmBn: scala.util.matching.Regex = a+b+\r\n<\/pre>\n<p>This is the first time in these tutorials that we are explicitly creating an object using the reserved word <strong>new<\/strong>. We\u2019ll be covering objects in more detail later, but what you need to know now is that Scala has a great deal of functionality that is not available by default. Mostly, we\u2019ve been working with things like Strings, Ints, Doubles, Lists, and so on \u2014 and for the most part it has appeared to you as though they are \u201cjust\u201d Strings, Ints, Doubles, and Lists. However, that is not the case: actually they are fully specified as:<\/p>\n<ul>\n<li>java.lang.String<\/li>\n<li>scala.Int<\/li>\n<li>scala.Double<\/li>\n<li>scala.List<\/li>\n<\/ul>\n<p>And, in the case of the last one, <strong>scala.List<\/strong> is a type that is actually backed by a concrete implementation in <strong>scala.collection.immutable.List<\/strong>. So, when you just see \u201cList\u201d, Scala is actually hiding some detail; most importantly, it makes it possible to use extremely common types with very little fuss.<\/p>\n<p>What <strong>scala.util.matching.Regex<\/strong> is telling you is that the Regex class is part of the <strong>scala.util.matching<\/strong> package (and that <strong>scala.util.matching<\/strong> is a subpackage of <strong>scala.util<\/strong>, which itself is a subpackage of the <strong>scala<\/strong> package). Fortunately, you don\u2019t need to type out <strong>scala.util.matching<\/strong> every time you want to use Regex: just use an <strong>import<\/strong> statement, and then use Regex without the extra package specification.<\/p>\n<pre class=\"brush: scala;\">scala&gt; import scala.util.matching.Regex\r\nimport scala.util.matching.Regex\r\n \r\nscala&gt; val AmBn = new Regex(\"a+b+\")\r\nAmBn: scala.util.matching.Regex = a+b+\r\n<\/pre>\n<p>The other thing to explain is the <strong>new<\/strong> part. Again, we\u2019ll cover this in more detail later, but for now think about it the following way. The Regex class is like a factory for producing regex objects, and the way you request (order) one of those objects is to say \u201c<strong>new Regex(\u2026)<\/strong>\u201c, where the <strong>\u2026<\/strong> indicates the string that should be used to define the properties of that object. You\u2019ve actually been doing this quite a lot already when creating Lists, Ints, and Doubles, but again, for those core types, Scala has provided special syntax to simplify their creation and use.<\/p>\n<p>Okay, but why would one want to use <strong>new Regex(\u201ca+b+\u201d)<\/strong> when <strong>\u201ca+b+\u201d.r<\/strong> can be used to do the same? Here\u2019s why: the latter needs to be given a complete string, but the former can be built up from several String variables. As an example, say you want a regex that matches strings of the form \u201c<em>the\/a dog\/cat\/mouse\/bird chased\/ate the\/a dog\/cat\/mouse\/bird<\/em>\u201d such as \u201c<em>the dog chased the cat<\/em>\u201d and \u201c<em>a cat chased the bird<\/em>.\u201d The following might be the first attempt.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val Transitive = \"(a|the) (dog|cat|mouse|bird) (chased|ate) (a|the) (dog|cat|mouse|bird)\".r\r\nTransitive: scala.util.matching.Regex = (a|the) (dog|cat|mouse|bird) (chased|ate) (a|the) (dog|cat|mouse|bird)\r\n<\/pre>\n<p>This works, but we can also build it without repeating the same expression twice by using a variable that contains a String defining a regular expression (but which is <strong>not<\/strong> a Regex object itself) and building the regex with that.<\/p>\n<pre class=\"brush: scala;\">scala&gt; val nounPhrase = \"(a|the) (dog|cat|mouse|bird)\"\r\nnounPhrase: java.lang.String = (a|the) (dog|cat|mouse|bird)\r\n \r\nscala&gt; val Transitive = new Regex(nounPhrase + \" (chased|ate) \" + nounPhrase)\r\nTransitive: scala.util.matching.Regex = (a|the) (dog|cat|mouse|bird) (chased|ate) (a|the) (dog|cat|mouse|bird)\r\n<\/pre>\n<p>The next tutorial will show how to use the <strong>scala.util.matching<\/strong> package API to do more extensive matching with regular expressions, such as finding multiple matches and performing substitutions.<\/p>\n<p><strong><i>Reference: <\/i><\/strong><a href=\"http:\/\/bcomposes.wordpress.com\/2011\/09\/04\/first-steps-in-scala-for-beginning-programmers-part-5\/\">First steps in Scala for beginning programmers, Part 5<\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/p\/jcg.html\">JCG partner<\/a> Jason Baldridge at the <a href=\"http:\/\/bcomposes.wordpress.com\/\">Bcomposes<\/a> blog.<\/p>\n<p><strong><i>Related Articles :<\/i><\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/09\/scala-tutorial-scala-repl-expressions.html\">Scala Tutorial &#8211; Scala REPL, expressions, variables, basic types, simple functions, saving and running programs, comments<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/09\/scala-tutorial-tuples-lists-methods-on.html\">Scala Tutorial &#8211; Tuples, Lists, methods on Lists and Strings<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/09\/scala-tutorial-conditional-execution.html\">Scala Tutorial &#8211; conditional execution with if-else blocks and matching<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-iteration-for.html\">Scala Tutorial &#8211; iteration, for expressions, yield, map, filter, count<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions_05.html\">Scala Tutorial &#8211; regular expressions, matching and substitutions with the scala.util.matching API<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-maps-sets-groupby.html\">Scala Tutorial &#8211; Maps, Sets, groupBy, Options, flatten, flatMap<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-scalaiosource-accessing.html\">Scala Tutorial &#8211; scala.io.Source, accessing files, flatMap, mutable Maps<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-objects-classes.html\">Scala Tutorial &#8211; objects, classes, inheritance, traits, Lists with multiple related types, apply<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-scripting-compiling-main.html\">Scala Tutorial &#8211; scripting, compiling, main methods, return values of functions<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/11\/scala-tutorial-sbt-scalabha-packages.html\">Scala Tutorial &#8211; SBT, scalabha, packages, build systems<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/11\/scala-tutorial-code-blocks-coding-style.html\">Scala Tutorial &#8211; code blocks, coding style, closures, scala documentation project<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/09\/fun-with-function-composition-in-scala.html\">Fun with function composition in Scala<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/08\/how-scala-changed-way-i-think-about-my.html\">How Scala changed the way I think about my Java Code<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2011\/09\/testing-with-scala.html\">Testing with Scala<\/a><\/li>\n<li><a href=\"http:\/\/www.javacodegeeks.com\/2010\/12\/things-every-programmer-should-know.html\">Things Every Programmer Should Know<\/a><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Preface This is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I\u2019m creating these for.&nbsp;Additionally you can find this and other tutorial series on the JCG&nbsp;Java Tutorials&nbsp;page. This &hellip;<\/p>\n","protected":false},"author":67,"featured_media":227,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[235],"class_list":["post-649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scala","tag-scala-tutorial"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Scala Tutorial - regular expressions, matching - Java Code Geeks<\/title>\n<meta name=\"description\" content=\"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scala Tutorial - regular expressions, matching - Java Code Geeks\" \/>\n<meta property=\"og:description\" content=\"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2011-10-04T21:25:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2012-10-21T20:32:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jason Baldridge\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@http:\/\/twitter.com\/jasonbaldridge\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jason Baldridge\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html\"},\"author\":{\"name\":\"Jason Baldridge\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/95ef2670a4040b0f7101c48dba2795c0\"},\"headline\":\"Scala Tutorial &#8211; regular expressions, matching\",\"datePublished\":\"2011-10-04T21:25:00+00:00\",\"dateModified\":\"2012-10-21T20:32:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html\"},\"wordCount\":1756,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/scala-logo.jpg\",\"keywords\":[\"Scala Tutorial\"],\"articleSection\":[\"Scala\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html\",\"name\":\"Scala Tutorial - regular expressions, matching - Java Code Geeks\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/scala-logo.jpg\",\"datePublished\":\"2011-10-04T21:25:00+00:00\",\"dateModified\":\"2012-10-21T20:32:43+00:00\",\"description\":\"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/scala-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/scala-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2011\\\/10\\\/scala-tutorial-regular-expressions.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"JVM Languages\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/jvm-languages\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Scala\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/jvm-languages\\\/scala\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Scala Tutorial &#8211; regular expressions, matching\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/95ef2670a4040b0f7101c48dba2795c0\",\"name\":\"Jason Baldridge\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g\",\"caption\":\"Jason Baldridge\"},\"sameAs\":[\"http:\\\/\\\/bcomposes.wordpress.com\\\/\",\"http:\\\/\\\/www.linkedin.com\\\/pub\\\/jason-baldridge\\\/5\\\/629\\\/9b2\",\"https:\\\/\\\/x.com\\\/http:\\\/\\\/twitter.com\\\/jasonbaldridge\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/jason-baldridge\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scala Tutorial - regular expressions, matching - Java Code Geeks","description":"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html","og_locale":"en_US","og_type":"article","og_title":"Scala Tutorial - regular expressions, matching - Java Code Geeks","og_description":"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other","og_url":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2011-10-04T21:25:00+00:00","article_modified_time":"2012-10-21T20:32:43+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg","type":"image\/jpeg"}],"author":"Jason Baldridge","twitter_card":"summary_large_image","twitter_creator":"@http:\/\/twitter.com\/jasonbaldridge","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Jason Baldridge","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html"},"author":{"name":"Jason Baldridge","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/95ef2670a4040b0f7101c48dba2795c0"},"headline":"Scala Tutorial &#8211; regular expressions, matching","datePublished":"2011-10-04T21:25:00+00:00","dateModified":"2012-10-21T20:32:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html"},"wordCount":1756,"commentCount":1,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg","keywords":["Scala Tutorial"],"articleSection":["Scala"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html","url":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html","name":"Scala Tutorial - regular expressions, matching - Java Code Geeks","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg","datePublished":"2011-10-04T21:25:00+00:00","dateModified":"2012-10-21T20:32:43+00:00","description":"PrefaceThis is part 5 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/scala-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2011\/10\/scala-tutorial-regular-expressions.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"JVM Languages","item":"https:\/\/www.javacodegeeks.com\/category\/jvm-languages"},{"@type":"ListItem","position":3,"name":"Scala","item":"https:\/\/www.javacodegeeks.com\/category\/jvm-languages\/scala"},{"@type":"ListItem","position":4,"name":"Scala Tutorial &#8211; regular expressions, matching"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/95ef2670a4040b0f7101c48dba2795c0","name":"Jason Baldridge","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b755d282f869512990e0ce9118c71ccd859fad42163f8e5d62d180ea42ea9720?s=96&d=mm&r=g","caption":"Jason Baldridge"},"sameAs":["http:\/\/bcomposes.wordpress.com\/","http:\/\/www.linkedin.com\/pub\/jason-baldridge\/5\/629\/9b2","https:\/\/x.com\/http:\/\/twitter.com\/jasonbaldridge"],"url":"https:\/\/www.javacodegeeks.com\/author\/jason-baldridge"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=649"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/649\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/227"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}