Difference between revisions of "GPU621/DPS921 Could 4U"

From CDOT Wiki
Jump to: navigation, search
(Map and Reduce Library on GAE)
(Result)
 
(9 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
# [mailto:hxu86@myseneca.ca?subject=GPU621%20from%20CDOT%20Wiki Hualiang Xu], Everything.  
 
# [mailto:hxu86@myseneca.ca?subject=GPU621%20from%20CDOT%20Wiki Hualiang Xu], Everything.  
 
== Progress ==
 
== Progress ==
 +
Nov. 15th - 22nd: researching.
 +
Nov. 23rd - 24th: making PPT.
 +
Dec. 5th: Present.
 +
Dec. 8th: Upload work to Wiki.
  
 
== What is Google App Engine ==
 
== What is Google App Engine ==
Google App Engine is a platform for building scalable web applications and mobile backends. It has a efficient Map Reduce Library that processes large set of data in map and reduce pattern in parallelism. This project will use this library to process a serious of searching history and produce a recommendation list for user.  
+
Google App Engine is a platform for building scalable web applications and mobile backends. It has a efficient Map Reduce Library that processes large set of data in map and reduce pattern in parallelism. This project will use this library to process a serious of searching history and produce a recommendation list for user. It support Java and Python and I will use Java in this project.
  
 
== Map and Reduce Library on GAE ==
 
== Map and Reduce Library on GAE ==
In Map and Reduce Library, user code is only required for mapping and reduce function. The platform will shuffle and rearrange data order to make the upcoming reduce process extremely fast. This also reduces a lot of development work. For further boosting, the shuffle part can be rewritten to boost up the speed.  
+
In Map and Reduce Library, user code is only required for mapping and reduce function. The platform will shuffle and rearrange data order to make the upcoming reduce process extremely fast. This also reduces a lot of development work. Other big advantage comparing to Map Reduce on OpenMP is that programmer doesn't have to care about synchronization or resource allocation.
  
The original source code is [https://github.com/GoogleCloudPlatform/appengine-mapreduce/blob/master/java/example/src/com/google/appengine/demos/mapreduce/randomcollisions/CollisionFindingServlet.java here].
+
For further boosting, the shuffle part can be rewritten to boost up the speed. The original source code is [https://github.com/GoogleCloudPlatform/appengine-mapreduce/blob/master/java/example/src/com/google/appengine/demos/mapreduce/randomcollisions/CollisionFindingServlet.java here].
  
[[File:MPL.png|200px]]
+
[[File:MPL.png|600px]]
  
 
== Start a new project ==
 
== Start a new project ==
This [https://cloud.google.com/appengine/docs/python/tools/using-local-server link] is a completed setup guide by Google. In this project, one more step is to import Map and Reduce Library in java files.
+
This [https://cloud.google.com/appengine/docs/python/tools/using-local-server link] is a completed setup guide by Google.  
 +
 
 +
Basic GAP project with Map and Reduce Library.
 +
 
 +
[[File:strc.png|600px]]
 +
 
 +
In this project, one more step is to import Map and Reduce Library in java files.
  
 
Map: com.google.appengine.tools.mapreduce.Mapper;
 
Map: com.google.appengine.tools.mapreduce.Mapper;
  
 
Reduce: com.google.appengine.tools.mapreduce.Reducer; com.google.appengine.tools.mapreduce.ReducerInput;
 
Reduce: com.google.appengine.tools.mapreduce.Reducer; com.google.appengine.tools.mapreduce.ReducerInput;
 +
 +
== Program Profile ==
 +
 +
Nothing about parallelism here. This is an interface that communicates different parts of program and join them together. There are 3 sections here: mapping pattern, reduce pattern and file port (where the data came from). Platform Library will be pre-loaded in ever instance and no need to call. Default user interface calling section is included in every project.
 +
 +
[[File:profile.png|600px]]
  
 
== Map and Reduce Function ==
 
== Map and Reduce Function ==
 +
Map function:
 +
 +
In mapping process, program takes in a couple of user searching history as data. Choose a word randomly to compare, whenever a word among all searching history appears more than once it will be emitted (recorded).
 +
 +
[[File:map.png|600px]]
 +
 +
Reduce function:
 +
 +
[[File:reduce.png|600px]]
 +
 +
In reduce process, sharded_num determines how many group of result will be left. If not defined, it's 0 by default. Lastly you will have a group of result that user searching for in common.
 +
 +
== Result ==
 +
 +
This is the result of our project. In case you want to see the data you can find it [[File:searhHis.zip|here ]]
 +
 +
[[File:result.png|600px]]
 +
 +
== Conclusion ==
 +
 +
Goolge App Engine is a light wight but very powerful development platform that provides a lot of features. One of those futures that I used in this project is Map Reduce Library. It is very easy to use while doesn't require much coding. It process code in parallelism without any concern of synchronization.

Latest revision as of 13:19, 8 December 2016

Cloud 4U

Group Members

  1. Hualiang Xu, Everything.

Progress

Nov. 15th - 22nd: researching. Nov. 23rd - 24th: making PPT. Dec. 5th: Present. Dec. 8th: Upload work to Wiki.

What is Google App Engine

Google App Engine is a platform for building scalable web applications and mobile backends. It has a efficient Map Reduce Library that processes large set of data in map and reduce pattern in parallelism. This project will use this library to process a serious of searching history and produce a recommendation list for user. It support Java and Python and I will use Java in this project.

Map and Reduce Library on GAE

In Map and Reduce Library, user code is only required for mapping and reduce function. The platform will shuffle and rearrange data order to make the upcoming reduce process extremely fast. This also reduces a lot of development work. Other big advantage comparing to Map Reduce on OpenMP is that programmer doesn't have to care about synchronization or resource allocation.

For further boosting, the shuffle part can be rewritten to boost up the speed. The original source code is here.

MPL.png

Start a new project

This link is a completed setup guide by Google.

Basic GAP project with Map and Reduce Library.

Strc.png

In this project, one more step is to import Map and Reduce Library in java files.

Map: com.google.appengine.tools.mapreduce.Mapper;

Reduce: com.google.appengine.tools.mapreduce.Reducer; com.google.appengine.tools.mapreduce.ReducerInput;

Program Profile

Nothing about parallelism here. This is an interface that communicates different parts of program and join them together. There are 3 sections here: mapping pattern, reduce pattern and file port (where the data came from). Platform Library will be pre-loaded in ever instance and no need to call. Default user interface calling section is included in every project.

Profile.png

Map and Reduce Function

Map function:

In mapping process, program takes in a couple of user searching history as data. Choose a word randomly to compare, whenever a word among all searching history appears more than once it will be emitted (recorded).

Map.png

Reduce function:

Reduce.png

In reduce process, sharded_num determines how many group of result will be left. If not defined, it's 0 by default. Lastly you will have a group of result that user searching for in common.

Result

This is the result of our project. In case you want to see the data you can find it File:SearhHis.zip

Result.png

Conclusion

Goolge App Engine is a light wight but very powerful development platform that provides a lot of features. One of those futures that I used in this project is Map Reduce Library. It is very easy to use while doesn't require much coding. It process code in parallelism without any concern of synchronization.