76
edits
Changes
m
→Analysis: Spark vs Hadoop
=== Performance ===
== Analysis: Spark vs Hadoop Wordcount Performance ==
=== Methodology ===
Hadoop and Spark clusters can be deployed in cloud environments such as the Google Cloud Platform or Amazon EMR.
# Drag and drop the below word-count.py into the browser, or use 'UPLOAD FILES' to upload.
# word-count.py
#!/usr/bin/env python
import pyspark
import sys
if len(sys.argv) != 3:
raise Exception("Exactly 2 arguments are required: <inputUri> <outputUri>")
inputUri=sys.argv[1]
outputUri=sys.argv[2]
sc = pyspark.SparkContext()
lines = sc.textFile(sys.argv[1])