You can use Sparse to parse text blocks from files and other sources.
First, add Sparse as an dependency in your project's build.sbt:
libraryDependencies += "eu.matthiasbraun" %% "sparse" % "1.0"Then you can import its methods and objects:
import eu.matthiasbraun.sparse.Parser._Let's say the file you want to parse is this:
(unrelated text before first block)
start
first line in first block
second line in first block
end
(unrelated text before second block)
start
first line in second block
second line in second block
end
(more unrelated text)
Assuming that you're interested in the blocks that start with start and end with end, here's how you parse them:
First of all, you load that file using one of the methods in scala.io.Source:
val yourFile = fromFile(new File("parse/this/file"))In the case of our example file above, we know exactly how the start and end of a block looks like. So we can do the following to parse the two blocks from the file:
val blocksMaybe = parse(yourFile, from("start"), to("end"))And this is how you print the blocks:
blocksMaybe match {
case Success(blocks) => blocks.foreach { println }
case Failure(exception) => println(exception)
}We got a Try back from parse which contains, if the parsing was successful, the blocks from the parsed file.
The first block we got back is this:
start
first line in first block
second line in first block
end
Probably, the second block won't surprise you, but here it is for completeness' sake:
start
first line in second block
second line in second block
end
Otherwise, if something went wrong, the Try holds the first exception that occurred during parsing.
Should you be interested only in what's inside the blocks, and not in the lines that mark their beginning and their end, you might like to call parse like this:
parse(yourFile, after("start"), until("end"))The first block returned by that call is a bit different compared to the one we made using to and from:
first line in first block
second line in first block
Up till now you've seen from, to, after, and until to mark the start and end point of your blocks.
There is another one, before, that you can use if you're interested in the line that precedes the matching line.
The resulting blocks of
parse(yourFile, before("start"), until("end"))are
(unrelated text before second block)
start
first line in first block
second line in first block
and
(unrelated text before second block)
start
first line in second block
second line in second block
If the starts and the ends of your blocks vary you can define predicates to match them.
Let's change our example file a bit, to make parsing slightly more challenging:
blockStartPrefix: firstBlockHeader
first line in first block
second line in first block
end
blockStartPrefix: secondBlockHeader
first line in second block
second line in second block
end
Now, because the start of a block is different for each block, we can't match it verbatim as we did in the previous example. But we notice that beginnings of a block all share a common blockStartPrefix. Let's match that:
parse(yourFile, from(_.startsWith("blockStartPrefix"), to("end"))Defining predicates is of course not limited to from. Imagine that block ends vary like so:
end of block 1 ###
end of block 2 ###
In this case, we use to(_.endsWith("###")) in order to match the end of a block.
If the patterns are more complicated than that, you can always resort to regular expressions:
from(_.matches(yourRegexPattern))Maybe you need to consider the line number as well to determine if a line should be the beginning or the end of a block. Sparse lets you account for that, too:
val start = from((line, lineNr) => line.startsWith("start") && lineNr > 4)
parse(yourFile, start, to("end"))This way, the line not only has to begin with the string "start" but also needs to come after the fourth line in the file. If it's clear in your code that the first placeholder stands for the line and the second placeholder for the line number (or if you're feeling especially succinct today), you can shorten the above example to this:
val start = from(_.startsWith("start") && _ > 4)If you're not content with the predefined block markers (i.e., from, to, after, until, and before), you can roll your own:
/** The block begins two lines after the `predicate` matches. */
object twoLinesAfter extends MarkerFactory {
override def apply(predicate: ((String, Int) => Boolean)) =
BlockMarker(predicate, offset = +2)
}Your custom marker is used like all the predefined ones shown above:
val blocksMaybe = parse(yourFile, twoLinesAfter("start"), to("end"))If you're wondering why you could pass a simple string instead of the ((String, Int) => Boolean) predicate to twoLinesAfter, have a look at the MarkerFactory in Parser.scala
- Scala 2.10 for
Try - Scala-ARM 1.4 for reading files