Difference between revisions of "Winter 2015 SPO600 Project"

From CDOT Wiki
Jump to: navigation, search
(Phase 1: Identifying a Possible Optimization in the LAMP Stack)
 
(3 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
== Overview ==
 
== Overview ==
  
The Winter 2015 SPO600 Project consists of:
+
The Winter 2015 [[SPO600]] Project consists of:
 
* Phase 1: Identifying a Possible Place for Optimization in the LAMP Stack
 
* Phase 1: Identifying a Possible Place for Optimization in the LAMP Stack
 
* Phase 2: Optimizing
 
* Phase 2: Optimizing
Line 15: Line 15:
 
* '''P'''HP, '''P'''erl, or '''P'''ython (or a similar server-side web scripting language)
 
* '''P'''HP, '''P'''erl, or '''P'''ython (or a similar server-side web scripting language)
  
Additional opportunities:
+
In this first phase, you will find a possible place for optimization and performance improvement in the LAMP stack on Aarch64 systems. This improvement must not cause incorrect behaviour on any platform, and must not reduce performance on other platforms (particularly, on x86_64) Ideally, it should improve performance on multiple (or all) platforms. Note that the ''improvement'' may be a reduction in execution time, memory consumption, both, or through improved utilization of some other resource.
These two pieces of software are not part of the LAMP stack but could be significantly optimized:
 
* [[SPO600 - Createrepo Optimization|Createrepo]]
 
* [[SPO600 - RPM Optimization|RPM]]
 
  
In this first phase, you will find a possible place for optimization and performance improvement in the LAMP stack on Aarch64 systems. This improvement must not cause incorrect behaviour on any platform, and must not reduce performance on other platforms (particularly, on x86_64) Ideally, it should improve performance on multiple (or all) platforms. Note that the ''improvement'' may be a reduction in execution time, memory consumption, both, or through improved utilization of some other resource.
+
{{Admon/tip|Additional opportunities|These two pieces of software are not part of the LAMP stack but may also be undertaken as optimization projects: [[SPO600 - Createrepo Optimization|Createrepo]] and [[SPO600 - RPM Optimization|RPM]].}}
  
 
To find possible places for improvement, you may want to use one or more of these techniques:
 
To find possible places for improvement, you may want to use one or more of these techniques:

Latest revision as of 08:44, 9 March 2015

Overview

The Winter 2015 SPO600 Project consists of:

  • Phase 1: Identifying a Possible Place for Optimization in the LAMP Stack
  • Phase 2: Optimizing
  • Phase 3: Committing the Changes Upstream

Phase 1: Identifying a Possible Optimization in the LAMP Stack

The LAMP stack consists of:

  • the Linux operating system
  • Apache httpd webserver (or a similar webserver such as Nginx)
  • MySQL or MariaDB (or a similar database such as Postgresql, or a non-SQL database such as Mongodb)
  • PHP, Perl, or Python (or a similar server-side web scripting language)

In this first phase, you will find a possible place for optimization and performance improvement in the LAMP stack on Aarch64 systems. This improvement must not cause incorrect behaviour on any platform, and must not reduce performance on other platforms (particularly, on x86_64) Ideally, it should improve performance on multiple (or all) platforms. Note that the improvement may be a reduction in execution time, memory consumption, both, or through improved utilization of some other resource.

Idea.png
Additional opportunities
These two pieces of software are not part of the LAMP stack but may also be undertaken as optimization projects: Createrepo and RPM.

To find possible places for improvement, you may want to use one or more of these techniques:

  • Look for platform-specific code used for optimization where no platform-specific code has been provided for Aarch64 (also known as ARM64) platforms. For example, you may find that assembler or runtime CPU feature detection exists in a particular software package when used on a particular architecture(s), but that general C fallback routines are used on other platforms, including Aarch64.
  • Use profiling to find functions that take more time than expected, or which take much longer on Aarch64 than on x86_64.
  • Examine build instructions to find code that is being compiled with very generic build options, and where specifying a better set of optimizations

You can perform optimization in one of three ways:

  • Replace an algorithm with a more efficient algorithm.
  • Add additional Aarch64-specific optimizations, where other architectures already have architecture-specific performance code in place.
  • Change the build instructions (e.g., compiler options in the Makefile) to build the software with better optimization.

Deliverables for Phase 1

Recorded in one or more blog posts:

  • A clearly identified portion of the LAMP stack where you will be performing your work. For example, you might state that you are going to improve the functions in the file "lib_compress_rle.c" in a certain package, or that you're going to determine the optimal build flags for a package.
  • The reason(s) that you have selected this part of the stack to work on.
  • Baseline benchmarks (and/or profiling), including the methodology used to obtain those baseline results, with sufficient detail that someone else could reproduce your results. This baseline data will be used to gauge the effectiveness of your optimizations later.
  • A clear plan for how you will proceed with the project.
  • The details of the upstream community to which you will be contributing your changes. Include the details of the patch revision process used by that community.

Phase 2: Optimizing

Optimize the portion of the LAMP stack identified in Phase 1.

  • Prove that your changes improve performance on Aarch64 in a significant and useful way.
  • Quantify the improvement. Consider the improvement in terms of the total impact on the LAMP stack (e.g., you might triple the speed of a function, but if that function is only responsible for 0.0001% of the time it takes the LAMP stack to respond to a user's request, the overall impact is negligible).
  • Prove that the changes you have made do not cause regressions in the software (e.g., slower performance, more memory usage, incorrect operation, crashes, and so forth). Use the test suite provided by upstream, if there is one.
  • Submit your change to the upstream community. Ensure that your patches are formatted correctly, properly documented, and submitted in accordance with the community's requirements and procedures.

Deliverables for Phase 2

Recorded in one or more blog posts:

  • The details of the change(s) you are making to the software. Describe the changes and the reason for them, and link to an appropriate form of the changes (e.g., a git repository).
  • Your test results quantifying the improvement. Be methodical and present the information in an appropriate and convincing way.
  • Regression test results.
  • A link showing that you have submitted the changes upstream -- e.g., a bugzilla issue, mailing list archive entry, or GitHub pull request.

Phase 3: Committing the Changes Upstream

Ensure that the changes you have made are accepted upstream. Respond appropriately to any requests from the upstream community and maintainers for additional information, clarification, documentation, and code or formatting changes.

Many communities may respond slowly to patches, especially from new contributors. Be patient, but do not let your changes linger; contact the upstream maintainers, develop a good working relationship, and shepherd your patches through the review process.

Deliverables for Phase 3

Recorded in one or more blog posts:

  • The details of the interactions with the upstream community/maintainers. It would be a good idea to blog separately about each contact.
  • A link showing that the changes were accepted.

FAQ

  • Q: What if I can't improve performance?
    • A: You may have selected your area of focus poorly, or be using the wrong approach to optimization. Try alternate approaches.
  • Q: What if, after trying alternate approaches to optimization, I still can't improve performance?
    • A: You can complete the project by proving that the code and build instructions, as provided by upstream, cannot be further optimized. Note that in most cases this is much harder to do that to optimize some portion of the stack.
  • Q: What if upstream is slow to respond to my patches?
    • A: Contact the upstream community/maintainers to request that they review and/or commit your changes. Provide everything possible to make it easy for them to accept your changes, including regression tests and benchmarks.
  • Q: What are the due dates for this project?