Type-safe columns for spark DataFrames!
Spark | Maven Central | Codecov |
---|---|---|
2.4.x | Deprecated | |
3.0.x | ||
3.1.x | ||
3.2.x | ||
3.3.x | ||
3.4.x |
Doric offers type-safety in DataFrame column expressions at a minimum cost, without compromising performance. In particular, doric allows you to:
- Get rid of malformed column expressions at compile time
- Avoid implicit type castings
- Run DataFrames only when it is safe to do so
- Get all errors at once
- Modularize your business logic
You'll get all these goodies:
- Without resorting to Datasets and sacrificing performance, i.e. sticking to DataFrames
- With minimal learning curve: almost no change in your code with respect to conventional column expressions
- Without fully committing to a strong static typing discipline throughout all your code
Please, check out this notebook for examples of use and rationale (also available through the binder link).
You can also check our documentation page
Fetch the JAR from Maven:
Sbt
libraryDependencies += "org.hablapps" %% "doric_3-2" % "0.0.8"
Maven
<dependency>
<groupId>org.hablapps</groupId>
<artifactId>doric_3-2_2.12</artifactId>
<version>0.0.8</version>
</dependency>
Doric
depends on Spark internals, and it's been tested against the
following spark versions.
Doric is intended to offer a type-safe version of the whole Spark Column API. Please, check the list of open issues and help us to achieve that goal!
Please read the contribution guide 📋