intentmedia / mario   0.1.0

MIT License GitHub

Functional, Typesafe, Declarative Data Pipelines

Scala versions: 2.11 2.10

Mario

image

Mario is a Scala library focused on defining complex data pipelines in a functional, typesafe, and efficient way. See the launch blog post for more details on the motivation behind the library.

image Defining pipelines

Pipelines are very easy to build, using only the pipe function. You can construct pipelines with and without depedencies. Pipelines can be non-linear, but must be acyclic. The lack of cycles is enforced by the library, so it is impossible to define a cyclic dependency in Mario.

Execution of pipelines is done concurrently, guaranteeing that each step is executed just once.

image Usage

Import

import com.intentmedia.mario.Pipeline._

Example

Here is a simple 3 step pipeline:

// independent step
val step1 = pipe(0 until 100)
// unary dependent step
val step2 = pipe((a: Range) => a.size until 200, step1)
// binary dependent step
val step3 = pipe((a: Range, b: Range) => a ++ b, step1, step2)

step3.run().size
// 200 (step1 will be only executed once)

Independent pipelines can be executed using runWith:

val result = step3.runWith(pipe(println("foo")), pipe(println("bar")))
result.run().size
// 200 (will also print "foo" and "bar")

Pipelines can be composed using for comprehensions:

for {
  step1 <- pipe(0 until 100)
  step2 <- pipe((a: Range) => a.size until 200, step1)
  step3 <- pipe((a: Range, b: Range) => a ++ b, step1, step2)
} yield step3.run().size

image Installation

Add the following to your sbt build:

libraryDependencies += "com.intentmedia.mario" %% "mario" % "0.1.0"

image Roadmap

  • Fault tolerance
  • Implicit caching (in Spark)
  • Web UI