datamindedbe / lighthouse   0.3.1

Apache License 2.0 Website GitHub

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Scala versions: 2.11

Lighthouse

Maven Central CircleCI Codacy Badge

Caution

This library hasn't been actively maintained for a while, so on the 6th of September 2024 it has been archived.

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Principles

  • Configuration as code
  • Idempotent execution
  • Utilities for easier building and testing Apache Spark based applications

Start using Lighthouse

In your build.sbt, add this:

libraryDependencies += "be.dataminded" %% "lighthouse" % <version>
libraryDependencies += "be.dataminded" %% "lighthouse-testing" % <version> % Test

If you are using Maven, add this to your pom.xml:

<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse_2.11</artifactId>
    <version>[version]</version>
</dependency>
<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse-testing_2.11</artifactId>
    <version>[version]</version>
    <scope>test</scope>
</dependency>

Online Documentation

This README file only contains basic instructions. Here is a more complete tutorial: https://datamindedbe.github.io/lighthouse/tutorial/