GPU621/DPS921 Could 4U
Contents
Cloud 4U
Group Members
- Hualiang Xu, Everything.
Progress
What is Google App Engine
Google App Engine is a platform for building scalable web applications and mobile backends. It has a efficient Map Reduce Library that processes large set of data in map and reduce pattern in parallelism. This project will use this library to process a serious of searching history and produce a recommendation list for user. It support Java and Python and I will use Java in this project.
Map and Reduce Library on GAE
In Map and Reduce Library, user code is only required for mapping and reduce function. The platform will shuffle and rearrange data order to make the upcoming reduce process extremely fast. This also reduces a lot of development work. Other big advantage comparing to Map Reduce on OpenMP is that programmer doesn't have to care about synchronization or resource allocation.
For further boosting, the shuffle part can be rewritten to boost up the speed. The original source code is here.
Start a new project
This link is a completed setup guide by Google.
Basic GAP project with Map and Reduce Library.
In this project, one more step is to import Map and Reduce Library in java files.
Map: com.google.appengine.tools.mapreduce.Mapper;
Reduce: com.google.appengine.tools.mapreduce.Reducer; com.google.appengine.tools.mapreduce.ReducerInput;
Program Profile
Nothing about parallelism here. This is an interface that communicates different parts of program and join them together. There are 3 sections here: mapping pattern, reduce pattern and file port (where the data came from). Platform Library will be pre-loaded in ever instance and no need to call. Default user interface calling section is included in every project.
Map and Reduce Function
Map function:
In mapping process, program takes in a couple of user searching history as data. Choose a word randomly to compare, whenever a word among all searching history appears more than once it will be emitted (recorded).
Reduce function:
In reduce process, sharded_num determines how many group of result will be left. If not defined, it's 0 by default. Lastly you will have a group of result that user searching for in common.
Result
This is the result of our project. In case you want to see the data you can find it here
Conclusion
Goolge App Engine is a light wight but very powerful development platform that provides a lot of features. One of those futures that I used in this project is Map Reduce Library. It is very easy to use while doesn't require much coding. It process code in parallelism without any concern of synchronization.