Are you ready to take your data engineering skills to the next level with Spark? In this class, you’ll learn to batch process data, build data pipelines and process data in near real time with Spark.
|Module 1||Transformations/Actions, Pair RDDs, ReduceByKey, GroupByKey, Joins, Partitions||Narrow and wide transformations and stages, caching and persistence, checkpointing|
|Module 2||Data Frames, Data Formats: JSON, CSV, Avro, Parquet, Compression||Caching, Select and Filter, User Defined Functions, AWS and S3|
|Module 3||Micro-Batches and DStreams, Transformations and Output Operations, Windowing operations||State DStream, Checkpointing and Fault Tolerance, Deployment and Monitoring|
|Module 4||Map-Side Joins, Closures, Broadcast Variables, Accumulators||Optimizing Joins, Data Skew, Partitioning, Coalescing, Metrics Using Application UI|