How to tune Apache Spark Job for optimizations? How to perform join efficiently? How to tune AWS EMR Cluster for optimizations? How to tune S3 for optimizations? How to tune YARN for optimizations? How to tune HDFS for optimizations? How to Apache Spark Job fix errors? How to fix AWS EMR Cluster errors? How to fix S3 errors? How to fix YARN errors? and How to fix HDFS errors?

We cover answer all those questions in this super long technical blog.😅 Since this is optimisation guide, I will consider you are already familiar with basics.

Blog repo link for requesting…

Deployment Architecture:

Docker container can be deployed on:

1. Container orchestration tool like Kubernetes, Docker Swarm, OpenShift, etc for production purpose

2. Docker Daemon for development purpose

Container Isolation Architecture

This is step by step tutorial to install Maven dependencies in Zeppelin.

  1. Get the dependency declaration from Maven Respository website
  2. Create Zeppelin dependency declaration in following format: <groupId>:<artifactoryId>:<version> = ml.dmlc:xgboost4j-spark:0.90
  3. Stop/Restart Zeppelin spark and run the command





Devendra Parhate

Functional programmer who likes to process state with reactive distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store