Container is yet another collection abstraction.
Its implemented simply by a type class so you can happily extend it to what ever propose you need.
Containers initial propose was to be able to write logic that can run on lists as spark RDD's. And to be able to test distributed logic without the burden of spinning up spark context.
But after using them they turned to be useful to compare different computation engines, like scalding, flink etc. it was easy to just implement the fitting type class.
If you are using SBT, add the following to your build.sbt
:
libraryDependencies += "com.github.NoamShaish" %% "containers-core" % "0.1.0"
If you would like to use the RDD container, add the following to your build.sbt
:
libraryDependencies += "com.github.NoamShaish" %% "containers-core" % "0.1.0"
import containers.Container
import containers.Container.ops._
scala> def mapIt[C[_]: Container](c: C[Int]) = c.map(_ + 1)
mapIt: [C[_]](c: C[Int])(implicit evidence$1: containers.Container[C])C[Int]
scala> mapIt(List(1, 2, 3, 4))
res0: List[Int] = List(2, 3, 4, 5)
Not really impressive, but the same "logic" can run on an RDD
scala> mapIt(sc.parallelize(List(1, 2, 3, 4)).collect
res1: Array[Int] = Array(2, 3, 4, 5)
In containers-example you can see a more impressive examples such as:
- PI Estimation
- Logistic Regression Based Classification
both can run on either list or RDD with the exact same code.
sbt release with-defaults