Changes

GPU621/Apache Spark

9 bytes removed, 13:19, 30 November 2020

m

→‎Running the Jobs in Dataproc

Now that we have our project code, input files and Dataproc cluster setup we can proceed to run the Hadoop MapReduce and Spark wordcount jobs.

~~```~~'''Run the Hadoop MapReduce Job'''

# Go to '''Menu -> Big Data -> Dataproc -> Jobs'''

# Select 'SUBMIT JOB' and name your job ID

# Select your cluster

# Specify Hadoop as Job Type

# Specify JAR which contains the ~~hadoop mapreduce~~ Hadoop MapReduce algorithm, and give 3 arguments to wordcount** gs://<myBucketName>/hadoop-mapreduce-examples.jar~~# Input 3 arguments to the mapreduce algorithm~~** wordcount gs://<myBucketName>/inputFolder gs://<myBucketName>output** '''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''

~~[[File~~ wordcount gs:~~Dataproc-hadoop.jpg]]~~//<myBucketName>/inputFolder gs://<myBucketName>output

'''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''

[[File:Dataproc-hadoop.jpg]]

=== Results ===

DanielPark

76

edits

Changes

GPU621/Apache Spark

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools