This project is maintained by spoddutur
Have u also got this thought? Then you are at the right place. Please follow along to get insights of different options on how to build the interactivity with spark.
Apache Spark™ is a very fast and easy-to-use big-data processing engine. It came in as a very strong contender replacing Map-Reduce computation engine on top of HDFS and kind of succeeded in it. But the problem that we are going to look into here is: How do we enable interactive applications against Apache Spark?
There are two widely adopted approaches to communicate with Spark and each of it comes with their own limitations when it comes to flexible interaction:
Please refer to Appendix section below to know some more options that spark provides to submit spark-jobs programmatically
Following are some of the use-cases where the above two mentioned approaches to communicate with Spark fall short in providing the interaction user might want:
1. Have the trained model loaded in SparkSession and quickly predict for user given query.
2. Monitor the data crunching that spark-streaming is handling live
3. Access your big-data cached in spark-cluster from outside world
4. How to spawn your spark-job interactively from a web-application
5. Spark-as-a-Service via REST
There are many ways one might think of interacting with Spark while solving above use-cases. I have implemented following four solutions to address above discussed usecases and shared the corresponding github repository links where you can find further details about them. Hopefully it provides some insight into how to go about building an interactive spark application that caters to your needs:
I’ll list two other ways that spark provides to launch spark applications programmatically:
SparkLauncher is an option provided by spark to launch spark jobs programmatically as shown below. Its available in spark-launcher
artifact:
SparkAppHandle handle = new SparkLauncher()
.setSparkHome(SPARK_HOME)
.setJavaHome(JAVA_HOME)
.setAppResource(pathToJARFile)
.setMainClass(MainClassFromJarWithJob)
.setMaster("MasterAddress
.startApplication();
// or: .launch().waitFor()
Drawback:
This is another alternative provided by ApacheSpark, which is similar to SparkLauncher, to submit spark jobs in a RESTful way as shown below:
curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appResource" : "file:/myfilepath/spark-job-1.0.jar",
"clientSparkVersion" : "1.5.0",
"mainClass" : "com.mycompany.MyJob",
...
}'
This is a good reference to know about Spark REST api in detail.
Drawbacks: