Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark

9 bytes removed, 13:19, 30 November 2020
m
Running the Jobs in Dataproc
Now that we have our project code, input files and Dataproc cluster setup we can proceed to run the Hadoop MapReduce and Spark wordcount jobs.
 ```'''Run the Hadoop MapReduce Job'''
# Go to '''Menu -> Big Data -> Dataproc -> Jobs'''
# Select 'SUBMIT JOB' and name your job ID
# Select your cluster
# Specify Hadoop as Job Type
# Specify JAR which contains the hadoop mapreduce Hadoop MapReduce algorithm, and give 3 arguments to wordcount** gs://<myBucketName>/hadoop-mapreduce-examples.jar# Input 3 arguments to the mapreduce algorithm** wordcount gs://<myBucketName>/inputFolder gs://<myBucketName>output** '''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''
[[File wordcount gs:Dataproc-hadoop.jpg]]//<myBucketName>/inputFolder gs://<myBucketName>output
'''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''
[[File:Dataproc-hadoop.jpg]]
=== Results ===
76
edits