76
edits
Changes
m
```'''Run the Hadoop MapReduce Job'''
[[File wordcount gs:Dataproc-hadoop.jpg]]//<myBucketName>/inputFolder gs://<myBucketName>output
→Running the Jobs in Dataproc
Now that we have our project code, input files and Dataproc cluster setup we can proceed to run the Hadoop MapReduce and Spark wordcount jobs.
# Go to '''Menu -> Big Data -> Dataproc -> Jobs'''
# Select 'SUBMIT JOB' and name your job ID
# Select your cluster
# Specify Hadoop as Job Type
# Specify JAR which contains the hadoop mapreduce Hadoop MapReduce algorithm, and give 3 arguments to wordcount** gs://<myBucketName>/hadoop-mapreduce-examples.jar# Input 3 arguments to the mapreduce algorithm** wordcount gs://<myBucketName>/inputFolder gs://<myBucketName>output** '''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''
'''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''
[[File:Dataproc-hadoop.jpg]]
=== Results ===