Apache Spark Interview Questions
- Spark Architecture
- Client mode vs Cluster mode
- RDD vs Dataframe vs Dataset
- SparkContext vs SparkSession
- map() vs flatmap()
- reduce() vs reduceByKey()
- Performance Techniques
- Repartition vs Colesece
- Order By vs Sort By
- Persist vs Cache
- Skewness and Salting
- Map side Join
- Spark configuration for joining two large tables
- map() vs mapPartitions()
- Broadcast and Accumuator variables
- What is lineage and DAG?
- Relation between driver, executor, memory, cores, partitions, stage, job, task using example
14 Comments
https://github.com/gjeevanm/SparkDataSkewness/blob/master/src/main/scala/com/gjeevan/DataSkew/RemoveDataSkew.scala
ReplyDeletehttps://www.linkedin.com/posts/gauravpatil95_thinkhadoop-hive-spark-activity-6813879213045665792-xQ37
ReplyDeletehttps://dzone.com/articles/dynamic-partition-pruning-in-spark-30
ReplyDeletehttps://www.linkedin.com/feed/update/urn:li:activity:6816216259940663296
ReplyDeletehttps://www.linkedin.com/posts/aparup-chatterjee_apache-spark-30-dpp-activity-6816719664182366208-i6bV
ReplyDeletehttps://github.com/ankurchavda/SparkLearning
ReplyDeletehttps://www.java-success.com/spark-interview-qas-with-coding-examples-in-scala-part-1/
ReplyDeletehttps://www.linkedin.com/posts/yusuf-didighar-64922a166_spark-join-internals-activity-6821458775623507968-2Zjz
ReplyDeletehttps://stackoverflow.com/questions/32356143/what-does-setmaster-local-mean-in-spark
ReplyDeletehttps://medium.com/@deepa.account/spark-udfs-and-its-deterministic-nature-b69e3dfc020e
ReplyDeletehttps://luminousmen.com/post/hadoop-yarn-spark
ReplyDeletehttps://www.linkedin.com/posts/mayank-ahuja-4b3a23105_kafka-schema-activity-6825815707146575873-h3Vj
ReplyDeletehttps://www.analyticsvidhya.com/blog/2020/11/8-must-know-spark-optimization-tips-for-data-engineering-beginners/
ReplyDeleteI am giving spark training in Hyderabad, thanks to share valuable spark interview questions to learn spark in Hyderabad . . If you share answers also its really helpful
ReplyDelete