Goal:
Build a single jar with scala and python files and supply this jar to pyspark and be able to call both scala and python files. Main execution will be in python files which will use scala libraries internally using py4j.
How to include python files/package in jar file along with scala files using SBT ?
Project structure (open to change to whatever works)
parent_project
|
|-- child_project
|
|-- src
|
|-- main
|
|-- scala
|
|-- com.my_org.child_project
|
|-- s_file_1.scala
|-- s_file_2.scala
|-- python
|
|-- foo
|
|-- p_file_1.py
|-- p_file_2.py
|-- build.sbt -- for child project
|-- build.sbt -- for parent project
Sample build.sbt (for child project)
name := "child_project"
version := "1.0.0"
scalaVersion := "2.11.1"
val sparkVersion = "2.4.4"
lazy val dependencies = new {}
libraryDependencies ++= Seq()
Sample build.sbt (for parent project)
lazy val child_project = project.in(file("parent_project/child_project"))
.dependsOn(parent % "provided->provided;compile->compile;test->test;runtime->runtime")
.settings(
name := "child_project",
organization := "com.my_org",
unmanagedSourceDirectories in Compile += file("/parent_project/child_project/src/main/python"),
includeFilter in (Compile, unmanagedSources) := "*.scala" || "*.java" || "*.py"
assemblySettings
)
SBT Version = 0.13.16
SBT command for building jar
"project child_project" assembly
Specific questions:
- Is it possible to include package both python and scala code in a single jar ?
- Is it possible to supply this jar to pyspark and access both python and scala files out of it ?
- Any suggestion / workaround / better options for achieving the goal ?