An sbt plugin for publishing artifacts to the Hadoop Distributed File System (HDFS).
- sbt (0.13.5+ or 1.0.0+)
- An installation of Hadoop (>= 2.6.x) to target
Add the following line to project/plugins.sbt
. See the Using plugins section of the sbt documentation for more information.
addSbtPlugin("com.tapad.sbt" % "sbt-hadoop" % "0.2.1")
Assuming the HADOOP_HOME
environmental variable is set and pointing to a local installation of the Hadoop binaries, add the following configuration to your build definition:
hadoopClasspath := hadoopClasspathFromExecutable.value
hadoopHdfsArtifactPath := new HdfsPath("/path/to/desired/hdfs/target/artifact.jar")
enablePlugins(HadoopPlugin)
In order to publish artifacts to HDFS, sbt-hadoop needs valid configuration information about your Hadoop installation.
sbt-hadoop expects the typical Hadoop configuration files (core-site.xml
and hdfs-site.xml
) to be present on your local file system.
These configuration files can be discovered by sbt-hadoop in one of two ways:
- By allowing sbt-hadoop to invoke a locally available
hadoop
binary - By statically adding their addresses to
hadoopClasspath
Invoking the local binary allows sbt-hadoop to retrieve enough information to set the hadoopClasspath
automatically.
To use a local hadoop
binary for hadoopClasspath
inference, add the following assignment to your build definition:
hadoopClasspath := hadoopClasspathFromExecutable.value
sbt-hadoop will attempt to find a local hadoop
binary via the HADOOP_HOME
environmental variable.
This can be manually overridden by setting a value for the hadoopExecutable
setting key:
hadoopExecutable := Some(file("/usr/bin/hadoop"))
If a local hadoop
binary is not available, the hadoopClasspath
must be set statically.
Assuming configuration files are located on your local file system at /usr/local/hadoop-2.7.3/etc
, you can configure your build as follows:
hadoopClasspath := HadoopUtils.classpathFromDirectory(file("/usr/local/hadoop-2.7.3/etc"))
hadoopHdfsArtifactPath
must be set before attempting to publish an artifact to HDFS.
It represents the target destination on HDFS where your artifact will be published.
The value of hadoopHdfsArtifactPath
must be an instance of org.apache.hadoop.fs.Path
.
A type alias, HdfsPath
, is auto-imported for your convenience:
hadoopHdfsArtifactPath := new HdfsPath("/user/name/foo/bar/artifact-0.1.0.jar")
By default, sbt-hadoop is configured to upload the resulting artifact of the packageBin
task to HDFS.
It is still required to configure your hadoopClasspath
, set your hadoopHdfsArtifactPath
, and manually enable the HadoopPlugin
.
hadoopClasspath := hadoopClasspathFromExecutable.value
hadoopHdfsArtifactPath := new HdfsPath(s"/tmp/${name.value}-${version.value}.jar")
enablePlugins(HadoopPlugin)
Once your build is properly configured, an invocation of hadoop:publish
will build, and subsequentially publish, your binary artifact to HDFS.
For more information, refer to the Packaging documentation provided in the sbt reference manual.
To use sbt-hadoop in conjunction with sbt-assembly, add the following to your project/plugins.sbt
and build.sbt
files, respectively:
addSbtPlugin("com.eed3sign" % "sbt-assembly" % "0.14.4")
addSbtPlugin("com.tapad.sbt" % "sbt-hadoop" % "0.2.1")
hadoopLocalArtifactPath := (assemblyOutputPath in assembly).value
hadoopHdfsArtifactPath := new HdfsPath("/tmp", (assemblyJarName in assembly).value)
publish in Hadoop := (publish in Hadoop).dependsOn(assembly).value
Lastly, be sure to enable sbt-hadoop in your build.sbt
file:
enablePlugins(HadoopPlugin)
sbt-assembly will be enabled automatically.
Once the build definition is configured properly, an invocation of hadoop:publish
will build and subsequentially publish a fat jar to HDFS.
For more information, refer to the documentation provided by sbt-assembly and the scripted integration test found at plugin/src/sbt-test/sbt-hadoop/assembly.
- root (.)
The sbt plugin and underlying interface used to publish artifacts to HDFS.
The main features and functionality of sbt-hadoop
are tested using sbt's scripted-plugin
. scripted
tests exist in the src/sbt-test
directory of the root project.
To run these tests, issue scripted
from an sbt session:
$ sbt
> scripted
To selectively run a single scripted
test suite, issue scripted <name of plugin>/<name of test project>
. e.g. scripted sbt-hadoop/simple
.
Please note that publishLocal
will be invoked when running scripted
. scripted
tests take longer to run than unit tests and will log myriad output to stdout. Also note that any output written to stderr during the execution of a scripted
test will result in ERROR
level log entries. These log entries will not effect the resulting status of the actual test.
sbt-hadoop
uses https://github.com/sbt/sbt-release. Simply invoke release
from the root project to release all artifacts.