4
votes

Whenever I find some Scala / Spark code online, I want to directly paste it into spark-shell to try it out. (I am using spark-shell with Spark 1.6 on both CentOS and Mac OS.)

Generally, this approach works well, but I always have problems when lines start with a dot / period (indicating a continuing method call). If I move the dot to the previous line, it works.

Example: Here is some code I found online:

val paramMap = ParamMap(lr.maxIter -> 20)
  .put(lr.maxIter, 30) 
  .put(lr.regParam -> 0.1, lr.threshold -> 0.55)

So when I paste this directly into spark-shell, I see this error:

scala> val paramMap = ParamMap(lr.maxIter -> 20)
paramMap: org.apache.spark.ml.param.ParamMap = 
{
    logreg_d63b85553548-maxIter: 20
}

scala>   .put(lr.maxIter, 30) 
<console>:1: error: illegal start of definition
         .put(lr.maxIter, 30) 
         ^

scala>   .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
<console>:1: error: illegal start of definition
         .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
         ^

However, when I instead move the dot to the previous line, everything is ok.

scala> val paramMap = ParamMap(lr.maxIter -> 20).
     | put(lr.maxIter, 30).
     | put(lr.regParam -> 0.1, lr.threshold -> 0.55)
paramMap: org.apache.spark.ml.param.ParamMap = 
{
    logreg_d63b85553548-maxIter: 30,
    logreg_d63b85553548-regParam: 0.1,
    logreg_d63b85553548-threshold: 0.55
}

Is there a way to configure spark-shell so that it will accept lines that start with a dot (or equivalently, so that it will continue lines even if they don't end in a dot)?

5

5 Answers

7
votes

There must be no leading whitespace.

scala> "3"
res0: String = 3

scala> .toInt
res1: Int = 3

scala> "3"
res2: String = 3

scala>   .toInt
<console>:1: error: illegal start of definition
  .toInt
  ^

PS: Maybe it should ignore whitespace when a dot is detected. A JIRA was added on that concern here.

4
votes

Use :paste command:

scala> :paste
// Entering paste mode (ctrl-D to finish)

if (true)
  print("that was true")
else
  print("false")

// Exiting paste mode, now interpreting.

that was true
3
votes

you can also wrap your expression with curly braces

val paramMap = { ParamMap(lr.maxIter -> 20)
  .put(lr.maxIter, 30) 
  .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
}

This is because: The REPL is “greedy” and consumes the first full statement you type in, so attempting to paste blocks of code into it can fail

for more details see: http://alvinalexander.com/scala/scala-repl-how-to-paste-load-blocks-of-source-code

There is also nice feature :paste -raw

See http://docs.scala-lang.org/overviews/repl/overview.html

0
votes

The Spark shell has a built-in mechanism to allow for pasting in multiple line Spark Scala code or writing line-by-line Spark Scala code: by wrapping your code in parenthesis (). Moving your dots to the end of the line is not needed.

In your example start off with val paramMap = (. From here you can write each line by hand or paste in your multi-line linear regression hyperparameter code. Then add one more parenthesis ) after your code is finished to encapsulate it. When using this method don't use tab for indentation but use two spaces instead.

Full code example:

scala> val paramMap = (ParamMap(lr.maxIter -> 20)
 |   .put(lr.maxIter, 30)
 |   .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
 | )
0
votes

You can also put the periods on the line before, then it works fine. Although this may break style conventions

val paramMap = ParamMap(lr.maxIter -> 20).
  put(lr.maxIter, 30).
  put(lr.regParam -> 0.1, lr.threshold -> 0.55)