3
votes

I am trying to figure out how to create an build.sbt file for my own Scalding-based project.

Scalding source structure has no build.sbt file. Instead it has project/Build.scala build definition.

What would be the right way to integrate my own sbt project with Scalding, so I could also import it later in Eclipse with sbt-eclipse plugin?

Update:

For the following code:

import cascading.tuple.Fields
import com.twitter.scalding._

class Scan(args: Args) extends Job(args) {
  val output = TextLine("tmp/out.txt")

  val wordsList = List(
    ("john"),
    ("liza"),
    ("nina"),
    ("x"))

  val orderedPipe =
    IterableSource[(String)](wordsList, ('word))
      .debug
      .write(output)
}

With this build.sbt:

name := "Scan"

version := "1.0"

libraryDependencies := Seq("com.twitter" %% "scalding" % "0.11.1")

I get errors:

$ sbt
[info] Loading global plugins from /home/test/.sbt/0.13/plugins
[info] Set current project to Scan (in build file:/home/test/Cascading/Scala/Scan/)
> compile
[info] Updating {file:/home/test/Cascading/Scala/Scan/}scan...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading http://repo1.maven.org/maven2/com/twitter/scalding_2.10/0.11.1/scalding_2.10-0.11.1.jar ...
[info]  [SUCCESSFUL ] com.twitter#scalding_2.10;0.11.1!scalding_2.10.jar (641ms)
[info] Done updating.
[info] Compiling 1 Scala source to /home/test/Cascading/Scala/Scan/target/scala-2.10/classes...
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:1: not found: object cascading
[error] import cascading.tuple.Fields
[error]        ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:2: object twitter is not a member of package com
[error] import com.twitter.scalding._
[error]            ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:5: not found: type Job
[error] class Scan(args: Args) extends Job(args) {
[error]                                ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:5: not found: type Args
[error] class Scan(args: Args) extends Job(args) {
[error]                  ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:5: too many arguments for constructor Object: ()Object
[error] class Scan(args: Args) extends Job(args) {
[error]           ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:6: not found: value TextLine
[error]   val output = TextLine("tmp/out.txt")
[error]                ^
[error] /home/test/Cascading/Scala/Scan/src/main/scala/Scan.scala:15: not found: value IterableSource
[error]     IterableSource[(String)](wordsList, ('word))
[error]     ^
[error] 7 errors found
[error] (compile:compile) Compilation failed

Update 2

After doing git clone [email protected]:twitter/scalding.git their repository and sbt publishLocal I still have the same compilation errors.

BUT adding two lines that you suggested to build.sbt allowed me to compile my code. So the following build.sbt really works, thanks!

name := "BlockScan"

version := "1.0"

libraryDependencies := Seq("com.twitter" %% "scalding" % "0.11.1")

lazy val scaldingCore = ProjectRef(uri("https://github.com/twitter/scalding.git"), "scalding-core")

lazy val myProject = project in file(".") dependsOn scaldingCore

'sbt eclipse' creates Eclipse project wich does not compile under Eclipse and reports these errors:

Project 'Scan' is missing required Java project: 'scalding-core'
More than one scala library found in the build path (/home/test/usr/eclipse-scala-3.0.3/configuration/org.eclipse.osgi/bundles/290/1/.cp/lib/scala-library.jar, /home/test/wks/Cascading/Scala/scalding/target/scala-2.9.3/scalding-assembly-0.10.0.jar).At least one has an incompatible version. Please update the project build path so it contains only compatible scala libraries. 

scalacheck_2.9.3-1.10.0.jar is cross-compiled with an incompatible version of Scala (2.9.3). 

specs_2.9.3-1.6.9.jar is cross-compiled with an incompatible version of Scala (2.9.3). 
1
You want to use Scalding, not modify it, correct?joescii

1 Answers

5
votes

Since they don't seem to publish their libraries to remote repositories where you could pull down the necessary dependencies, you'll have to declare the source dependency on the GitHub repository for the project.

lazy val scaldingCore = ProjectRef(uri("https://github.com/twitter/scalding.git"), "scalding-core")

lazy val myProject = project in file(".") dependsOn scaldingCore

With the above build definition, sbt will git clone the RootProject and load the build.

➜  scalding  xsbt
[info] Loading global plugins from /Users/jacek/.sbt/0.13/plugins
Cloning into '/Users/jacek/.sbt/0.13/staging/e1da2accb95841ffb1df/scalding'...
[info] Loading project definition from /Users/jacek/.sbt/0.13/staging/e1da2accb95841ffb1df/scalding/project
[info] Updating {file:/Users/jacek/.sbt/0.13/staging/e1da2accb95841ffb1df/scalding/project/}scalding-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.eed3si9n/sbt-assembly/scala_2.10/sbt_0.13/0.10.2/jars/sbt-assembly.jar ...
[info]  [SUCCESSFUL ] com.eed3si9n#sbt-assembly;0.10.2!sbt-assembly.jar (3600ms)
[info] Done updating.
[info] Compiling 3 Scala sources to /Users/jacek/.sbt/0.13/staging/e1da2accb95841ffb1df/scalding/project/target/scala-2.10/sbt-0.13/classes...
[warn] there were 8 deprecation warning(s); re-run with -deprecation for details
[warn] there were 2 feature warning(s); re-run with -feature for details
[warn] two warnings found
[info] Set current project to myProject (in build file:/Users/jacek/sandbox/scalding/)
> projects
[info] In file:/Users/jacek/sandbox/scalding/
[info]   * myProject
[info] In https://github.com/twitter/scalding.git
[info]     maple
[info]     scalding
[info]     scalding-args
[info]     scalding-avro
[info]     scalding-commons
[info]     scalding-core
[info]     scalding-date
[info]     scalding-hadoop-test
[info]     scalding-jdbc
[info]     scalding-json
[info]     scalding-parquet
[info]     scalding-repl

The build setup should give you access to scalding classes.

> console
[info] Starting scala interpreter...
[info]
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_60).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import com.twitter.scalding._
import com.twitter.scalding._

And the Scan class compiles fine - it's in src/main/scala directory.

> show sources
[info] ArrayBuffer(/Users/jacek/sandbox/scalding/src/main/scala/Scan.scala)
[success] Total time: 0 s, completed Jul 15, 2014 12:21:14 AM
> compile
[info] Updating {file:/Users/jacek/sandbox/scalding/}myProject...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/jacek/sandbox/scalding/target/scala-2.10/classes...
[success] Total time: 4 s, completed Jul 15, 2014 12:21:20 AM

You could also git clone [email protected]:twitter/scalding.git their repository and sbt publishLocal to be able to declare binary dependency in build.sbt as follows:

libraryDependencies := Seq("com.twitter" %% "scalding" % "0.11.1")

With the dependency in (either way), execute sbt eclipse and be done with it!