92
edits
Changes
→Deploy Apache Spark Application On AWS
[[File: output spark.png | 800px]]
===Check cluster status===
Spark provides a simple dash board to check the status of the cluster. Visit <your_cluster_master_DNS>:18080, you will see the dash board.
[[File: Dashboard spark.png | 800px]]
Click the application id, you can see more details like job descriptions.
[[File: Spark jobs.png | 800px]]
Or the stage descriptions.
[[File: Spark stages.png | 800px]]
===Conclusion===
With Amazon EMR you can set up a cluster to process and analyze data with big data frameworks in just a few minutes. You can install Spark on an Amazon EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3.
==References==
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html
https://www.databricks.com/glossary/what-is-rdd#:~:text=RDD%20was%20the%20primary%20user,that%20offers%20transformations%20and%20actions.
https://www.oreilly.com/library/view/apache-spark-2x/9781787126497/d0ae45f4-e8a1-4ea7-8036-606b7e27ddfd.xhtml
https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/