GPU621/DPS921 Could 4U

From CDOT Wiki
Revision as of 14:08, 8 December 2016 by Hxu86 (talk | contribs)
Jump to: navigation, search

Cloud 4U

Group Members

  1. Hualiang Xu, Everything.

Progress

What is Google App Engine

Google App Engine is a platform for building scalable web applications and mobile backends. It has a efficient Map Reduce Library that processes large set of data in map and reduce pattern in parallelism. This project will use this library to process a serious of searching history and produce a recommendation list for user. It support Java and Python and I will use Java in this project.

Map and Reduce Library on GAE

In Map and Reduce Library, user code is only required for mapping and reduce function. The platform will shuffle and rearrange data order to make the upcoming reduce process extremely fast. This also reduces a lot of development work. Other big advantage comparing to Map Reduce on OpenMP is that programmer doesn't have to care about synchronization or resource allocation.

For further boosting, the shuffle part can be rewritten to boost up the speed. The original source code is here.

MPL.png

Start a new project

This link is a completed setup guide by Google.

Basic GAP project with Map and Reduce Library.

Strc.png

In this project, one more step is to import Map and Reduce Library in java files.

Map: com.google.appengine.tools.mapreduce.Mapper;

Reduce: com.google.appengine.tools.mapreduce.Reducer; com.google.appengine.tools.mapreduce.ReducerInput;

Program Profile

Nothing about parallelism here. This is an interface that communicates different parts of program and join them together. There are 3 sections here: mapping pattern, reduce pattern and file port (where the data came from). Platform Library will be pre-loaded in ever instance and no need to call. Default user interface calling section is included in every project.

Profile.png

Map and Reduce Function

Map function:

In mapping process, program takes in a couple of user searching history as data. Choose a word randomly to compare, whenever a word among all searching history appears more than once it will be emitted (recorded).

Map.png

Reduce function:

Reduce.png

In reduce process, sharded_num determines how many group of result will be left. If not defined, it's 0 by default. Lastly you will have a group of result that user searching for in common.

Result

Result.png

Conclusion

Goolge App Engine is a light wight but very powerful development platform that provides a lot of features. One of those futures that I used in this project is Map Reduce Library. It is very easy to use while doesn't require much coding. It process code in parallelism without any concern of synchronization.