Changes

Jump to: navigation, search

GPU621/Apache Spark

3 bytes removed, 12:51, 30 November 2020
m
Analysis: Spark vs Hadoop
# To open Browser: Menu -> Storage -> Browser
# Drag and drop the below word-count.py into the browser, or use 'UPLOAD FILES' to upload.
<Code>
#!/usr/bin/env python
import pysparkimport sys #!/usr/bin/env python
import pyspark import sys  if len(sys.argv) != 3:
raise Exception("Exactly 2 arguments are required: <inputUri> <outputUri>")
inputUri=sys.argv[1] outputUri=sys.argv[2]  sc = pyspark.SparkContext() lines = sc.textFile(sys.argv[1]) words = lines.flatMap(lambda line: line.split()) wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda count1, count2: count1 + count2) wordCounts.saveAsTextFile(sys.argv[2])
sc = pyspark.SparkContext()
lines = sc.textFile(sys.argv[1])
words = lines.flatMap(lambda line: line.split())
wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda count1, count2: count1 + count2)
wordCounts.saveAsTextFile(sys.argv[2])
</Code>
=== Results ===
76
edits

Navigation menu