Spark Connector for Workday is a SOAP web services wrapper around the Workday API published here.
This library requires Spark 2.x
For Spark 1.x support, please check spark1.x branch.
You can link against this library in your program at the following ways:
<dependency>
<groupId>com.springml</groupId>
<artifactId>spark-workday_2.11</artifactId>
<version>1.1.0</version>
</dependency>
libraryDependencies += "com.springml" % "spark-workday_2.11" % "1.1.0"
This package can be added to Spark using the --packages
command line option. For example, to include it when starting the spark shell:
$ bin/spark-shell --packages com.springml:spark-workday_2.11:1.1.0
- Construct Spark Dataframe using Workday data - User has to provide WWS (Workday Web Service) request and list of XPath to read data from Workday. The XPath will be evaluated against WWS response and dataframe will be constructed based on that
username
: Workday Web Service Username.password
: Workday Web Service Password.wwsEndpoint
: Workday Web Service endpoint.request
: Workday Web Service request. This will be used to execute the required Web Service. Sample request is present over hereobjectTagPath
: XPath of the response element which should be considered as Object elementdetailsTagPath
: XPath of the detail element in object. This will be used to get Object Detail elementxpathMap
: Location of CSV file which should contain fieldName, fieldType and its XPath. Sample file is present over herenamespacePrefixMap
: Location of CSV file which should contain prefix and its corresponding namespace. Sample file is present over here
// Construct Dataframe using WWS
// Request to be executed against WWS
// Here Get_Customer_Invoices operation from Revenue_Management Service is used
// https://community.workday.com/custom/developer/API/Revenue_Management/v27.0/Get_Customer_Invoices.html
val request = "<bsvc:Get_Customer_Invoices_Request xmlns:bsvc=\"urn:com.workday/bsvc\"><bsvc:Response_Filter><bsvc:As_Of_Effective_Date>2016-09-09</bsvc:As_Of_Effective_Date><bsvc:As_Of_Entry_DateTime>2016-09-09</bsvc:As_Of_Entry_DateTime><bsvc:Page>1</bsvc:Page><bsvc:Count>100</bsvc:Count></bsvc:Response_Filter><bsvc:Response_Group><bsvc:Include_Reference>1</bsvc:Include_Reference><bsvc:Include_Customer_Invoice_Data>1</bsvc:Include_Customer_Invoice_Data></bsvc:Response_Group></bsvc:Get_Customer_Invoices_Request>"
// Below constructs dataframe by executing wws
// Customer_Invoice is the object element and hence //wd:Customer_Invoice in objectTagPath
// Customer_Invoice_Line_Replacement_Data is the detail element and
// hence /wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data in detailsTagPath
// detailsTagPath is relative to objectTagPath
val df = spark.read.
format("com.springml.spark.workday").
option("username", "wws_username").
option("password", "wws_password").
option("wwsEndpoint", "wws_endpoint").
option("request", request).
option("objectTagPath", "//wd:Customer_Invoice").
option("detailsTagPath", "/wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data").
option("xpathMap","/home/xpath.csv").
option("namespacePrefixMap","/home/namespaces.csv").
load()
# Request to be executed against WWS
# Here Get_Customer_Invoices operation from Revenue_Management Service is used
# https://community.workday.com/custom/developer/API/Revenue_Management/v27.0/Get_Customer_Invoices.html
ws_request <- "<bsvc:Get_Customer_Invoices_Request xmlns:bsvc=\"urn:com.workday/bsvc\"><bsvc:Response_Filter><bsvc:As_Of_Effective_Date>2016-09-09</bsvc:As_Of_Effective_Date><bsvc:As_Of_Entry_DateTime>2016-09-09</bsvc:As_Of_Entry_DateTime><bsvc:Page>1</bsvc:Page><bsvc:Count>100</bsvc:Count></bsvc:Response_Filter><bsvc:Response_Group><bsvc:Include_Reference>1</bsvc:Include_Reference><bsvc:Include_Customer_Invoice_Data>1</bsvc:Include_Customer_Invoice_Data></bsvc:Response_Group></bsvc:Get_Customer_Invoices_Request>"
# Below constructs dataframe by executing wws
# Customer_Invoice is the object element and hence //wd:Customer_Invoice in objectTagPath
# Customer_Invoice_Line_Replacement_Data is the detail element and
# hence /wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data in detailsTagPath
# detailsTagPath is relative to objectTagPath
df <- read.df(source="com.springml.spark.workday",
username="wws_username",
password="wws_password",
wwsEndpoint="wws_endpoint",
request=ws_request,
objectTagPath="//wd:Customer_Invoice",
detailsTagPath="/wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data",
xpathMap="/home/xpath.csv",
namespacePrefixMap="/home/namespaces.csv")
This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package
from the project root.