Changes

GPU621/Apache Spark Fall 2022

1,170 bytes added, 15:20, 5 December 2022

→‎Deploy Apache Spark Application On AWS

[[File: output spark.png | 800px]]

===Check cluster status===

Spark provides a simple dash board to check the status of the cluster. Visit <your_cluster_master_DNS>:18080, you will see the dash board.

[[File: Dashboard spark.png | 800px]]

Click the application id, you can see more details like job descriptions.

[[File: Spark jobs.png | 800px]]

Or the stage descriptions.

[[File: Spark stages.png | 800px]]

===Conclusion===

With Amazon EMR you can set up a cluster to process and analyze data with big data frameworks in just a few minutes. You can install Spark on an Amazon EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3.

==References==

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html

https://www.databricks.com/glossary/what-is-rdd#:~:text=RDD%20was%20the%20primary%20user,that%20offers%20transformations%20and%20actions.

https://www.oreilly.com/library/view/apache-spark-2x/9781787126497/d0ae45f4-e8a1-4ea7-8036-606b7e27ddfd.xhtml

https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/

RobinYu

92

edits

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

CDOT Wiki ^β