What's the best way to match strings in a file to case class in Scala?

Question

We have a file that contains data that we want to match to a case class. I know enough to brute force it but looking for an idiomatic way in scala.

Given File:

#record
name:John Doe
age: 34

#record
name: Smith Holy
age: 33 

# some comment

#record
# another comment
name: Martin Fowler
age: 99

(field values on two lines are INVALID, e.g. name:John\n Smith should error)

And the case class

case class Record(name:String, age:Int)

I Want to return a Seq type such as Stream:

val records: Stream records

The couple of ideas i'm working with but so far haven't implemented is:

Remove all new lines and treat the whole file as one long string. Then grep match on the string "((?!name).)+((?!age).)+age:([\s\d]+)" and create a new object of my case class for each match but so far my regex foo is low and can't match around comments.
Recursive idea: Iterate through each line to find the first line that matches record, then recursively call the function to match name, then age. Tail recursively return Some(new Record(cumulativeMap.get(name), cumulativeMap.get(age)) or None when hitting the next record after name (i.e. age was never encountered)
?? Better Idea?

Thanks for reading! The file is more complicated than above but all rules are equal. For the curious: i'm trying to parse a custom M3U playlist file format.

Nicolas Rinaudo Nicolas Rinaudo · Accepted Answer · 2016-11-11T21:01:45

I'd use kantan.regex for a fairly trivial regex based solution.

Without fancy shapeless derivation, you can write the following:

import kantan.regex._
import kantan.regex.implicits._

case class Record(name:String, age:Int) 
implicit val decoder = MatchDecoder.ordered(Record.apply _)
input.evalRegex[Record](rx"(?:name:\s*([^\n]+))\n(?:age:\s*([0-9]+))").toList

This yields:

List(Success(Record(John Doe,34)), Success(Record(Smith Holy,33)), Success(Record(Martin Fowler,99)))

Note that this solution requires you to hand-write decoder, but it can often be automatically derived. If you don't mind a shapeless dependency, you could simply write:

import kantan.regex._
import kantan.regex.implicits._
import kantan.regex.generic._

case class Record(name:String, age:Int) 
input.evalRegex[Record](rx"(?:name:\s*([^\n]+))\n(?:age:\s*([0-9]+))").toList

And get the exact same result.

Disclaimer: I'm the library's author.

What's the best way to match strings in a file to case class in Scala?

4 Answers