MLeap [Inactive]

Note: This repository is no longer being maintained. Please see https://github.com/combust-ml/mleap for the active project.

Introduction

Easily put your Spark ML Pipelines into action with MLeap. Train your feature and regression/classification pipeline with Spark then easily convert them to MLeap pipelines to deploy them anywhere. Take your pipelines to an API server, Hadoop, or even back to Spark to execute on a DataFrame.

MLeap allows for easy serialization of your estimator and transformer pipelines so you can save them for reuse later. Executing an MLeap pipeline does not require a SparkContext or DataFrame so there is very little overhead for realtime one-off predictions. You don't have to worry about Spark dependencies for executing your models, just add the lightweight MLeap runtime library instead.

MLeap makes deploying your Spark ML pipelines with 3 core functions:

Release: Deploy your entire ML pipeline without a SparkContext or any dependency on Spark libraries.
Reuse: Export your ML pipeline to easy-to-read JSON files so you can reuse pipelines.
Recycle: Export your training pipelines to easy-to-read JSON files so you can easily modify your training pipelines.

Setup

Link with Maven or SBT

MLeap is cross-compiled for Scala 2.10 and 2.11, so just replace 2.10 with 2.11 wherever you see it if you are running Scala version 2.11 and using a POM file for dependency management. Otherwise, use the %% operator if you are using SBT and the correct Scala version will be used.

SBT

libraryDependencies += "com.truecar.mleap" %% "mleap-runtime" % "0.1.3"

Maven

<dependency>
    <groupId>com.truecar.mleap</groupId>
    <artifactId>mleap-runtime_2.10</artifactId>
    <version>0.1.3</version>
</dependency>

For Spark Integration

SBT

libraryDependencies += "com.truecar.mleap" %% "mleap-spark" % "0.1.3"

Maven

<dependency>
    <groupId>com.truecar.mleap</groupId>
    <artifactId>mleap-spark_2.10</artifactId>
    <version>0.1.3</version>
</dependency>

Spark Packages

MLeap is now a Spark Package. The package includes mleap-spark and mleap-serialization, so you should have full functionality with it. Here is how you can run a Spark shell with MLeap loaded.

$ bin/spark-shell --packages com.truecar.mleap:mleap-package_2.10:0.1.3

Modules

MLeap is broken into 4 modules:

mleap-core - Core execution building blocks, includes runtime for executing linear regressions, random forest models, logisitic regressions, assembling feature vectors, string indexing, one hot encoding, etc. It provides a core linear algebra system for all of these tasks.
mleap-runtime - Provides LeapFrame, which is essentially a lightweight DataFrame without any dependencies on the Spark libraries. LeapFrames support 3 data types: double, string, and vector. Also provides MLeap Transformers for executing ML pipelines on LeapFrames. Spark ML pipelines get converted to MLeap pipelines which are provided with this library.
mleap-spark - Provides Spark/MLeap integration. SparkLeapFrame is an implementation of LeapFrame with a Spark RDD backing the data so you can execute MLeap transformers on a Spark cluster. Provides conversion from Spark Transformers to MLeap Transformers. Provides conversion from MLeap Estimators to Spark Estimators. This allows a very intuitive usage of MLeap without worrying about how Spark is being used under the hood: MLeap Estimator -> Spark Estimator -> Spark Transformer -> MLeap Transformer.
mleap-serialization - Provides serialization for MLeap and Spark models to common JSON/Protobuf format.

Example

Please see the mleap-demo project for an example of building and using a pipeline with MLeap.

Supported Transformers

Currently MLeap only supports a select set of estimators/transformers in Spark as a proof of concept.

Feature

StringIndexer
Tokenizer
HashingTF
VectorAssembler
StandardScaler

Regression

LinearRegression
RandomForestRegressor

Classification

RandomForestClassification
SupportVectorMachine (Must Use Estimator Provided with mleap-spark)

Miscellaneous

Pipeline

Future of MLeap

Provide Python/R bindings
Unify linear algebra and core ML models library with Spark
Deploy outside of the JVM to embedded systems
Full support for all Spark transformers

Contributing

There are a few ways to contribute to MLeap.

Write documentation. As you can see looking through the source code, there is very little.
Contribute an Estimator/Transformer from Spark.
Use MLeap at your company and tell us what you think.
Make a feature request or report a bug in github.
Make a pull request for an existing feature request or bug report.
Join the discussion of how to get MLeap into Spark as a dependency.

Contact Information

Hollin Wilkins ([email protected])
Mikhail Semeniuk ([email protected])
Ram Sriharsha ([email protected])

License

See LICENSE and NOTICE file in this repository.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

truecar / mleap 0.1.5

MLeap [Inactive]

Note: This repository is no longer being maintained. Please see https://github.com/combust-ml/mleap for the active project.

Introduction

Setup

Link with Maven or SBT

SBT

Maven

For Spark Integration

SBT

Maven

Spark Packages

Modules

Example

Supported Transformers

Feature

Regression

Classification

Miscellaneous

Future of MLeap

Contributing

Contact Information

License

Statistics

8 Dependencies

No Dependent