76
edits
Changes
m
[[File'''Some counters of note:Dataproc-hadoop'''# '''number of splits''': splits from all input data. A split is the amount of data in one map task (Hadoop default block size is 128 MB)# '''Launched map tasks''': number of total map tasks. Note that it matches the number of splits# '''Launched reduce tasks''': number of total reduce tasks# '''GS: Number of bytes read''': the total number of bytes read from Google Cloud Storage by both map and reduce tasks# '''GS: Number of bytes written''': the total number of bytes writtenfrom Google Cloud Storage by both map and reduce tasks# '''Map input records''': number of records (words) processed by all map tasks# '''Reduce output records''': number of records (word) output by all reduce tasks.jpg]]# '''CPU time spent (ms)''': total CPU processing time by all tasks
→Running the Jobs in Dataproc
'''note: Running the job will create the output folder, However for subsequent jobs be sure to delete the output folder else Hadoop or Spark will not run. This limitation is done to prevent existing output from being overwritten'''
[[File:Dataproc-hadoop.jpg]]
'''Retrieve the Results'''
You can observe the progress of each of the map and reduce jobs in the '''Job output''' console.
When the jobs has have completed and all the input files have been processed, Hadoop provides '''counters''', statistics on the executed JobSome counters of note:# number of splits: a split is Also you can navigate back the amount of data in one map task (Hadoop default block size is 128 MB)# Launched map tasks: number of total map tasks. Note that it matches the number of splits# Launched reduce tasks: number of total reduce tasks# GS: Number of bytes read: '''Jobs''' tab to see the total number of bytes read from Google Cloud Storage by both map and reduce tasks# GS: Number Elapsed Time of bytes written: the total number of bytes writtenfrom Google Cloud Storage by both map and reduce tasks# Map input records: number of records (words) processed by all map tasks# Reduce output records: number of records (word) output by all reduce tasksjob.# CPUT time spent (ms): total CPU processing time by all tasks
=== Results ===