This project contains some basic runnable tools that can help with various tasks around a Spark based project.
The main tools available:
- FormatConverter Converts any acceptable file format into a different file format, providing also partitioning support.
- SimpleSqlProcessor Applies a given SQL to the input files which are being mapped into tables.
- StreamingFormatConverter Converts any acceptable data stream format into a different data stream format, providing also partitioning support.
- SimpleFileStreamingSqlProcessor Applies a given SQL to the input files streams which are being mapped into file output streams.
This project is also trying to create and encourage a friendly yet professional environment for developers to help each other, so please do no be shy and join through gitter, twitter, issue reports or pull requests.
- Java 8 or higher
- Scala 2.11 or 2.12
- Apache Spark 2.4.X
Spark Tools is published to Maven Central and Spark Packages:
where the latest artifacts can be found.
- Group id / organization:
org.tupol
- Artifact id / name:
spark-tools
- Latest version is
0.4.1
Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
Include this package in your Spark Applications using spark-shell
or spark-submit
with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
or with Scala 2.12
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
0.4.1
- Added
StreamingFormatConverter
- Added
FileStreamingSqlProcessor
,SimpleFileStreamingSqlProcessor
- Bumped
spark-utils
dependency to0.4.2
- The project compiles with both Scala
2.11.12
and2.12.12
- Updated Apache Spark to
2.4.6
- Updated
delta.io
to0.6.1
- Updated the
spark-xml
library to0.10.0
- Removed the
com.databricks:spark-avro
dependency, as avro support is now built into Apache Spark - Updated the
spark-utils
dependency to the latest available snapshot
For previous versions please consult the release notes.
This code is open source software licensed under the MIT License.