A set of utilities for working with Ionic encryption in Spark.
Main components include
- transformers (for working with dataframes)
- caching and key re-use
- mocks and other testing tools
Currently scala only, though python and java support are planned.
See more in "/examples".
The core feature of this library is a Spark transformer that makes it easy to encrypt or decrypt columns.
import io.github.turtlemonvh.ionicsparkutils.KeyServicesCache;
import io.github.turtlemonvh.ionicsparkutils.{Transformers => IonicTransformers};
import com.ionic.sdk.agent.Agent
import com.ionic.sdk.device.profile.persistor.DeviceProfiles
def agentFactory(): KeyServices = {
// Load profile JSON from whatever secure storage you have available
// Each cloud provider has secret store interfaces that work well here
val threadLocalAgent = new Agent(new DeviceProfiles(profileJson))
// Wrap in a cache layer so that each a single key is used for each transform operation
new KeyServicesCache(threadLocalAgent)
}
// A new column will be added named "ionic_enc_mycolumn"
// You probably want to call `.drop` and `.withColumnRenamed` on the
// resulting dataset to clean things up.
val encryptedDF = mydataset
.transform(IonicTransformers.Encrypt(
encryptCols = List("mycolumn"),
decryptCols = List(),
agentFactory = agentFactory
))
Works for basic operations. Spark API is likely to change in future releases.
# Start a shell
$ sbt
# Compile the code
> compile
# Run the tests
> test
# Get a list of all tests
> show test:definedTests
# Run a subset of tests
> testOnly io.github.turtlemonvh.ionicsparkutils.TestAgentTest
# Reload after changes to build.sbt and friends
> reload
Junit tests are sometimes skipped by sbt. Running clean
seems to consistently fix this behavior. Test results are dumped in target/test-reports/*.xml
.
- Project bootstrapped via: https://github.com/holdenk/sparkProjectTemplate.g8