Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

126 bytes added, 22:07, 3 December 2022
Create an EMR cluster
Click the Create Cluster button.
-IMAGE-[[File: cluster name.png | 600px]]
Enter as cluster name and choose a release version. Here I will choose the EMR-5.11.1 for the Release version. For the application, you can see that there are many options, we will choose Spark as this is our main topic.
-IMAGE-[[File: ec2.png | 600px]]
Next, we need to choose an instance type. As you may know, the cluster will run on multiple EC2 instances and different EC2 instances have different features. Please note, different EC2 types cost differently. Please refer to the EC2 type table to check the prices. Here I will choose c4.large type as it’s the most inexpensive one. For the number of instances, I will choose 3, that is, one master and 2 nodes.
Click Create Cluster button to wait for the cluster to be set up.
-IMAGE-[[File: cluster info.png | 600px]]
You will see a page like this. Next, we need to change the security group for Master, which acts like a firewall to add an inbound rule.
-IMAGE-[[File: inbound rule.png | 600px]]
We need to open port 22 and port 18080 for your IP so that you can visit the Master EC2.
You should see a welcome page like this:
-IMAGE-[[File: welcome page.png | 600px]]
===Create an S3 bucket===
92
edits