Search *

Page 5 of 204 results

databrickslabs/automl-toolkit 0.7.2

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

Scala versions: 2.11

apache-spark feature-engineering spark scala ml pyspark machinelearning

191 7
pishen/sbt-lighter 1.2.0

SBT plugin for Apache Spark on AWS EMR

emr sbt spark

57 6
yaooqinn/itachi 0.3.0

A library that brings useful functions from various modern database management systems to Apache Spark

Scala versions: 2.12

postgres spark hive presto trino

56 2
potix2/spark-google-spreadsheets 0.6.3

Google Spreadsheets datasource for SparkSQL and DataFrames

Scala versions: 2.11

sparksql scala data-frame spark spreadsheet

57 5
uosdmlab/spark-nkp 0.3.3

Natural Korean Processor for Apache Spark

Scala versions: 2.11

nlp apache-spark text-mining korean-nlp spark natural-language-processing spark-mllib

53 2
cerndb/sparkplugins 0.3

Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.

Scala versions: 2.13 2.12

kubernetes monitoring scala spark

85 1
starlake-ai/starlake 1.3.0

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

Scala versions: 2.13 2.12

data-engineering etl bigquery spark data-pipeline synapse data-integration redshift snowflake hdfs

57 1389 27
hydrospheredata/spark-ml-serving 0.3.3

Spark ML Lib serving library

Scala versions: 2.11

inference scoring serving spark

48 2
hablapps/sparkoptics 0.1.1

Optics for Spark DataFrames

Scala versions: 2.12 2.11

dataframes dataframe optics spark-sql spark scala

47 4
locationtech-labs/geopyspark 0.3.0

GeoTrellis for PySpark

Scala versions: 2.11

tile-server geotrellis big-data geospatial spark python

179 9
tharwaninitin/etlflow 1.7.3

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.

Scala versions: 3.x 2.13 2.12

Scala.js versions: 1.x

dataproc gcs etl bigquery scala redis aws s3 gcp etl-framework etl-pipeline spark zio

44 5
benfradet/struct-type-encoder 0.6.0

Deriving Spark DataFrame schemas from case classes

Scala versions: 2.12

spark sparksql

44 6
coxautomotivedatasolutions/spark-distcp 0.2.5

A re-implementation of Hadoop DistCP in Apache Spark

Scala versions: 2.13

apache-spark data-engineering distcp hadoop spark

44 3
absaoss/hyperdrive 4.7.0

Extensible streaming ingestion pipeline on top of Apache Spark

Scala versions: 2.12 2.11

apache-spark streaming spark-structured-streaming framework pipeline kafka streaming-etl spark ingestion

44 11
univalence/zio-spark 0.12.0

A functional wrapper around Spark to make it works with ZIO

Scala versions: 3.x 2.13 2.12 2.11

scala spark zio zio-spark

42 9
g-research/spark-dgraph-connector 0.2.0

A connector for Apache Spark and PySpark to Dgraph databases.

Scala versions: 2.12

dgraph gr-oss pyspark spark

43 5
zuinnote/spark-hadoopoffice-ds 1.7.0

A Spark datasource for the HadoopOffice library

Scala versions: 2.13 2.12 2.11

read xlsx xls excel spark datasource write hadoopoffice

39 1
xskipper-io/xskipper 1.3.0

An Extensible Data Skipping Framework

Scala versions: 2.12

data-skipping indexing scala spark

42 5
heartsavior/spark-sql-kafka-offset-committer 0.2.0

Kafka offset committer for structured streaming query

Scala versions: 2.12 2.11

kafka spark structured-streaming

37 3
tupol/spark-utils 0.6.2

Basic framework utilities to quickly start writing production ready Apache Spark applications

Scala versions: 2.12

apache-spark convenience data-source framework data-sink spark scala spark-applications spark-streaming

36 1

1
2
3
4
5 (current)
6
7
8
9
10