2
votes

In the code snippet below I am reading a JSON file with a structure similar to this one:

{ "c7254865-87b5-4d34-a7bd-6ba6c9dbab14": "72119c87-7fce-4e17-9770-fcfab04328f5"}
{ "36c18403-1707-48c4-8f19-3b2e705007d4": "72119c87-7fce-4e17-9770-fcfab04328f5"}
{ "34a71a88-ae2d-4304-a1db-01c54fc6e4d8": "72119c87-7fce-4e17-9770-fcfab04328f5"}

Each line contains a key value pair which should be then added to a map in Scala. This is the Scala code which I used for this purpose:

val fs = org.apache.hadoop.fs.FileSystem.get(new Configuration())

def readFile(location: String): mutable.HashMap[String, String] = {
  val path: Path = new Path(location)
  val dataInputStream: FSDataInputStream = fs.open(path)
  val m = new mutable.HashMap[String, String]()
  for (line <- Source.fromInputStream(dataInputStream).getLines) {
    val parsed: Option[Any] = JSON.parseFull(line)
    m ++= parsed.get.asInstanceOf[Map[String, String]]
  }
  m
}

There must be a more elegant way to do this in Scala for sure. Especially you should be able to get rid of the mutable map and ingest the lines directly via a stream into a map. How can you do that?

2

2 Answers

3
votes
val r: Map[String, String] = Source.fromInputStream(dataInputStream).getLines
    .map(line => JSON.parseFull(line).get)
    .flatMap { case m: Map[String, String] => m.map { case (k, v) => k -> v } }
    .toMap

Keep in mind that JSON (you mean scala.util.parsing.json.JSON, right?) is itself marked @deprecated in Scala 2.11

EDIT: as per suggestions of @SergGr and @Dima, this can be further simplified as

val r: Map[String, String] = Source.fromInputStream(dataInputStream).getLines
    .flatMap(line => JSON.parseFull(line))
    .collect { case m: Map[String, String] => m }
    .flatten.toMap

The last correction also has better handling of unexpected JSON (e.g, if an array is passed in)

0
votes
val json =scala.io.Source.fromString("""
 { "c7254865-87b5-4d34-a7bd-6ba6c9dbab14": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 { "36c18403-1707-48c4-8f19-3b2e705007d4": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 { "34a71a88-ae2d-4304-a1db-01c54fc6e4d8": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 """)

Split the string, then map each entry in the Array to a key and value, and then convert to a Map. This returns a scala.collection.immutable.Map[String,String]

scala> json.map(x => x.split(":")).map(x => x(0) -> x(1)).toMap

res35: scala.collection.immutable.Map[String,String] = Map(
 { "c7254865-87b5-4d34-a7bd-6ba6c9dbab14" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}",
 { "36c18403-1707-48c4-8f19-3b2e705007d4" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}",
 { "34a71a88-ae2d-4304-a1db-01c54fc6e4d8" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}")