Difference between revisions of "Winter 2014 SPO600 Weekly Schedule"
Chris Tyler (talk | contribs) |
Chris Tyler (talk | contribs) |
||
(35 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Winter 2014 SPO600]] | [[Category:Winter 2014 SPO600]] | ||
− | {{Admon/important|It's Alive!|This weekly schedule will be updated as the course proceeds - dates and content are subject to change. The cells in | + | {{Admon/obsolete|the [[Current SPO600 Weekly Schedule]]}} |
− | {|cellspacing="0" width="100%" cellpadding="5" border="1" | + | |
+ | <!-- {{Admon/important|It's Alive!|This [[SPO600]] weekly schedule will be updated as the course proceeds - dates and content are subject to change. The cells in the summary table will be linked to relevant resources and labs as the course progresses.}} --> | ||
+ | |||
+ | == Summary Table == | ||
+ | |||
+ | This is a summary/index table. Please follow the links in each cell for additional detail -- especially for the ''Deliverables'' column. | ||
+ | |||
+ | {|cellspacing="0" width="100%" cellpadding="5" border="1" style="background: #e0e0ff" | ||
|- | |- | ||
− | !Week!!Week of...!!Tuesday - Class!! | + | !Week!!Week of...!!Tuesday - Class!!Friday - ALC/Lab!!Deliverables |
|- | |- | ||
− | |1||Jan 6||[[#Tuesday (Jan 7)|Introduction to Software Porting, Portability, Benchmarking, and Optimization]]||[[#Friday (Jan 10)|How is Code Accepted? - Analyze code submissions in two separate open source projects]]||[[#Week 1 Deliverables|Blog a commentary on code reviews in two communities]] | + | |1||Jan 6||[[#Tuesday (Jan 7)|Introduction to Software Porting, Portability, Benchmarking, and Optimization]]||[[#Friday (Jan 10)|How is Code Accepted? - Analyze code submissions in two separate open source projects]]||[[#Week 1 Deliverables|Blog a commentary on code reviews in two communities]] (Lab 1) |
|- | |- | ||
− | |2||Jan 13|| | + | |2||Jan 13||[[#Tuesday (Jan 14)|Computer Architecture Overview]]||[[#Friday (Jan 17)|Hello World - Compile a basic C program and analyze the resultant binary]]||[[#Week 2 Deliverables|Set up a Fedora system and the ARMv8 Foundation Model / Blog on binary analysis (Lab 2)]] |
|- | |- | ||
− | |3||Jan 20|| | + | |3||Jan 20||[[#Tuesday (Jan 21)|Introduction to Assembly Language]]||[[#Friday (Jan 24)|x86_64 and Aarch64 Assembley Language]]||[[#Week 3 Deliverables|Blog about writing in assembly language (Lab 3)]] |
|- | |- | ||
− | |4||Jan 27|| | + | |4||Jan 27||[[#Tuesday (Jan 28)|Lab 3 results, inline assembler, and compiler optimizations]]||[[#Friday (Jan 31)|Analyzing a codebase for assembler and non-portable code]]||[[#Week 3 Deliverables|Blog post about codebase analysis]] |
|- | |- | ||
− | |5||Feb 3|| | + | |5||Feb 3||[[#Tuesday (Feb 4)|Memory Barriers and Atomics]]||[[#Friday (Feb 7)|Potential Project Analysis]]||[[#Week 5 Deliverables|Blog about your selected projects]] |
|- | |- | ||
− | |6||Feb 10|| | + | |6||Feb 10||[[#Tuesday (Feb 11)|Architecture-specific Code for Performance]]||Group hack session - Porting||[[#Week 5 Deliverables|Identify the assembler in your projects and contact your upstream communities.]] |
|- | |- | ||
|7||Feb 17||Portability - Removing platform-specific code||Group hack session - Portability||Remove platform-specific code from your projects | |7||Feb 17||Portability - Removing platform-specific code||Group hack session - Portability||Remove platform-specific code from your projects | ||
− | |- | + | |-style="background: #f0f0ff" |
− | |Study Week||Feb 24|| | + | |Study Week||Feb 24||colspan="3" align="center"|Study Week |
|- | |- | ||
|8||Mar 3||Project Work||Project Work||Get code into review | |8||Mar 3||Project Work||Project Work||Get code into review | ||
|- | |- | ||
− | |9||Mar 10|| | + | |9||Mar 10||[[#Tuesday (March 11)|Status Update]]||[[#Friday (March 14)|Foundation Models]]||[[#Week 9 Deliverables|Install and Test With Foundation Model]] |
|- | |- | ||
− | |10||Mar 17|| | + | |10||Mar 17||[[#Tuesday (March 18)|Profiling ]]||Baseline Profiling||[[#Week 10 Deliverables|Post baseline stats for your software]] |
|- | |- | ||
|11||Mar 24||Optimizing Code||Group hack - Profiling and optimizing||Code review update | |11||Mar 24||Optimizing Code||Group hack - Profiling and optimizing||Code review update | ||
|- | |- | ||
− | |12||Mar 31|| | + | |12||Mar 31||Using complier optimizations||Project Work||Code review update |
+ | |- | ||
+ | |13||Apr 7||Final Presentations||(No class - Exams start)||Code accepted upstream | ||
+ | |-style="background: #f0f0ff" | ||
+ | |Exam Week||Apr 14||colspan="3" align="center"|Exam Week - No exam in this course! | ||
+ | |} | ||
+ | |||
+ | == Evaluation == | ||
+ | {|cellspacing="0" width="100%" cellpadding="5" border="1" style="background: #e0ffe0" | ||
+ | !Category!!Percentage!!Evaluation Dates | ||
+ | |- | ||
+ | |Communication||align="right"|20%||Jan 31, Feb 28, March 31, April 13 | ||
+ | |- | ||
+ | |Quizzes||align="right"|10%||May be held during any class. A minimum of 5 one-page quizzes will be given. Lowest 3 scores will not be counted. | ||
|- | |- | ||
− | | | + | |Labs||align="right"|10%||See deliverables column above. |
|- | |- | ||
− | | | + | |Project work||align="right"|60%||Feb 28, March 31, April 13 |
|} | |} | ||
− | |||
− | |||
== Week 1 == | == Week 1 == | ||
Line 74: | Line 92: | ||
* [[SPO600 Code Review Lab]] | * [[SPO600 Code Review Lab]] | ||
+ | * Start thinking about how you want to set up your [[SPO600 Software]] | ||
=== Week 1 Deliverables === | === Week 1 Deliverables === | ||
Line 80: | Line 99: | ||
# Blog your conclusion to the [[SPO600 Code Review Lab]]. | # Blog your conclusion to the [[SPO600 Code Review Lab]]. | ||
# Add yourself to the [[Winter 2014 SPO600 Participants]] page (leave the projects columns blank). | # Add yourself to the [[Winter 2014 SPO600 Participants]] page (leave the projects columns blank). | ||
+ | # Sign and return the [[Open Source Professional Option Student Agreement]]. | ||
+ | |||
+ | == Week 2 == | ||
+ | |||
+ | === Tuesday (Jan 14) === | ||
+ | * [[Computer Architecture]] (see also the [[:Category:Computer Architecture|Computer Architecture Category]]) | ||
+ | |||
+ | === Friday (Jan 17) === | ||
+ | * [[SPO600 Compiled C Lab]] | ||
+ | |||
+ | === Week 2 Deliverables === | ||
+ | * Blog your conclusion to the [[SPO600 Compiled C Lab]] | ||
+ | * [[SPO600 Host Setup|Set up a Fedora 20 system]] | ||
+ | |||
+ | == Week 3 == | ||
+ | |||
+ | === Tuesday (Jan 21) === | ||
+ | * [[Assembler Basics]] | ||
+ | |||
+ | === Friday (Jan 24) === | ||
+ | * Background information: [[SPO600 aarch64 QEMU on Ireland]] | ||
+ | * [[SPO600 Assembler Lab]] | ||
+ | |||
+ | === Week 3 Deliverables === | ||
+ | * Blog your conclusion to the [[SPO600 Assembler Lab]] | ||
+ | |||
+ | == Week 4 == | ||
+ | |||
+ | === Tuesday (Jan 28) === | ||
+ | * [[SPO600 Assembler Lab|Assembler Lab]] review | ||
+ | * [[Inline Assembly Language]] | ||
+ | * [[Compiler Optimizations]] | ||
+ | |||
+ | === Friday (Jan 31) === | ||
+ | |||
+ | * [[Codebase Analysis Lab]] | ||
+ | |||
+ | === Week 4 Deliverables === | ||
+ | |||
+ | * '''Reminder:''' Week 1-3 blog posts are due for marking on Friday, January 31. | ||
+ | * Blog about the [[Codebase Analysis Lab]] | ||
+ | |||
+ | == Week 5 == | ||
+ | |||
+ | === Tuesday (Feb 4) === | ||
+ | |||
+ | Platform-specific code is often utilized for '''Memory Barriers''' and '''Atomics Operations'''. | ||
+ | |||
+ | ==== Memory Barriers ==== | ||
+ | '''Memory Barriers''' ensure that memory accesses are sequenced so that multiple threads, processes, cores, or IO devices see a predictable view of memory. | ||
+ | * Leif Lindholm provides an excellent explanation of memory barriers. | ||
+ | ** Blog series - I recommend this series, especially the introduction, as a very clear explanation of memory barrier issues. | ||
+ | *** Part 1 - [http://community.arm.com/groups/processors/blog/2011/03/22/memory-access-ordering--an-introduction Memory Access Ordering - An Introduction] | ||
+ | *** Part 2 - [http://community.arm.com/groups/processors/blog/2011/04/11/memory-access-ordering-part-2--barriers-and-the-linux-kernel Memory Access Ordering Part 2 - Barriers and the Linux Kernel] | ||
+ | *** Part 3 - [http://community.arm.com/groups/processors/blog/2011/10/19/memory-access-ordering-part-3--memory-access-ordering-in-the-arm-architecture Memory Access Ordering Part 3 - Memory Access Ordering in the ARM Architecture] | ||
+ | ** Presentation at Embedded Linux Conference 2010 (Note: Acquire/Release in C++11 and ARMv8 aarch64 appeared after this presentation): | ||
+ | *** [http://elinux.org/images/f/fa/Software_implications_memory_systems.pdf Slides] | ||
+ | *** [http://free-electrons.com/pub/video/2010/elce/elce2010-lindholm-memory-450p.webm Video] | ||
+ | * [http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf Memory Barriers - A Hardware View for Software Hackers] - This is a highly-rated paper that explains memory barrier issues - as the title suggests, it is designed to describe the hardware origin of the problem to software developers. Despite the fact that it is an introduction to the topic, it is still very technical. | ||
+ | * [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14041.html ARM Technical Support Knowlege Article - In what situations might I need to insert memory barrier instructions?] - Note that there are some additional mechanisms present in ARMv8 aarch64, including Acquire/Release. | ||
+ | * [https://www.kernel.org/doc/Documentation/memory-barriers.txt Kernel Documentation on Memory Barriers] - discusses the memory barrier issue generally, and the solutions used within the Linux kernel. This is part of the kernel documentation. | ||
+ | * Acquire-Release mechanisms | ||
+ | ** [http://blogs.msdn.com/b/oldnewthing/archive/2008/10/03/8969397.aspx MSDN Blog Post] with a very clear explanation of Acquire-Release. | ||
+ | ** [http://preshing.com/20130922/acquire-and-release-fences/ Preshing on Programming post] with a good explanation. | ||
+ | ** [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.genc010197a/index.html ARMv8 Instruction Set Architecture Manual] (ARM InfoCentre registration required) - See the section on Acquire/Release and Load/Store. | ||
+ | |||
+ | ==== Atomics ==== | ||
+ | '''Atomics''' are operations which must be completed in a single step (or appear to be completed in a single step) without potential interruption. | ||
+ | * Wikipedia has a good basic overview of the need for atomicity in the article on [http://en.wikipedia.org/wiki/Linearizability Linerarizability] | ||
+ | * GCC provides intrinsics (built-in functions) for atomic operations, as documented in the GCC manual: | ||
+ | ** [http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fsync-Builtins.html#_005f_005fsync-Builtins Legacy __sync Built-in Functions for Atomic Memory Access] | ||
+ | ** [http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html#_005f_005fatomic-Builtins Built-in functions for memory model aware atomic operations] | ||
+ | * The Fedora project has some guidelines/recommendations for the use of these GCC builtins: | ||
+ | ** http://fedoraproject.org/wiki/Architectures/ARM/GCCBuiltInAtomicOperations | ||
+ | |||
+ | === Friday (Feb 7) === | ||
+ | |||
+ | ==== Hack Session: Potential Project Analysis ==== | ||
+ | |||
+ | Select a project from the [[Winter 2014 SPO600 Software List]] and perform these steps: | ||
+ | # Edit that page to put your name in the "Claimed by" column. | ||
+ | # Investigate the package to determine: | ||
+ | #* If the current version has been built for ARM (e.g., exists in the Fedora aarch64 port - fastest way to test is to use 'yum' inside the arm64 emulation environment on Ireland) | ||
+ | #* What the platform-specific code in the software does | ||
+ | #* Whether portable work-arounds exist | ||
+ | #* The need for an aarch64 port or for platform-specific code elimination | ||
+ | #* Opportunities for optimization | ||
+ | #* The amount of work involved in porting and optimizing, and your skills for performing that work | ||
+ | # Based on the result of your investigation, decide on your interest in the project. | ||
+ | #* If you wish to choose this project for yourself, place it on your row in the [[Winter 2014 SPO600 Participants|Participants]] page. | ||
+ | #* If you do not wish to choose this project, remove your name from the "Claimed by" column in the [[Winter 2014 SPO600 Software List|Software List]] page. | ||
+ | # Repeat until you have two packages. | ||
+ | |||
+ | {{Admon/note|Overload|It is strongly recommended that you choose two projects with a total scope sum of 0-1. If you wist to try a higher or lower sum, or more or less than two projects, please talk to your professor.}} | ||
+ | |||
+ | {{Admon/tip|RPM Packages|For sofware that is present in the rpmfusion repositories but not in Fedora, you can use <code>yumdownloader --source ''packagename''</code> to grab the source RPM and then examine it using the RPM tools. See [[RPM Packaging Process]] for information.}} | ||
+ | |||
+ | === Week 5 Deliverables === | ||
+ | |||
+ | * Blog about your two selected projects, including your detailed initial analysis of them. | ||
+ | ** You may want to break this into a couple of posts - e.g., post about your first package while you're working on your second. | ||
+ | ** Feel free to also blog about why you did '''not''' choose particular packages, too. | ||
+ | |||
+ | == Week 6 == | ||
+ | |||
+ | === Tuesday (Feb 11) === | ||
+ | |||
+ | * Architecture-specific code for Performance | ||
+ | ** Sometimes assembler is used in a C/C++ program for performance. However, modern versions of C/C++ (such as C++11) and recent compilers provide portable ways of accessing high-performance processor capabilities, such as Single Instruction/Multiple Data (SIMD) instructions (called "marketing names" such as SSE, Neon, MMX, 3DNow, or AltaVec on various processors). | ||
+ | ** Linaro enginener Matthew Gretton-Dann gave a good presentation on [http://www.linaro.org/linaro-blog/2013/09/20/introduction-to-porting-and-optimising-code/ Porting and Optimizing Code] for aarch64. The vectorization portion, beginning at 28:10, provides a good introduction to SIMD and autovectorization using GCC on aarch64 (Note that the earlier portion of the presentation includes good information about Atomics). | ||
+ | *** [http://www.youtube.com/watch?v=epzYErIIx0Y YouTube Video] direct link | ||
+ | *** [http://www.linaro.org/assets/common/campus-party-presentation-Sept_2013.pdf Slides] direct link | ||
+ | ** Note that in the presentation above, Matthew takes the code beyond portability without straying into assembler (e.g., using compiler-specific, architecture-specific intrinsics). It is possible to achieve almost all of the performance gains without becoming arch-specific, and most of those can be attained without becoming compiler-specific as well. | ||
+ | * For full details on the SIMD instructions in aarch64, refer to the [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.genc010197a/index.html ARMv8 Instruction Set Overview], particularly section 5.7. | ||
+ | |||
+ | === Week 6 Deliverables === | ||
+ | * Complete your analysis of your two selected software projects (if you haven't already) - see [[#Week 5|Week 5]]. Blog in detail about your findings. | ||
+ | * Identify the upstream communities that develop and maintain the software you have selected to work on. Figure out how they are structured, how they communicate, how code is maintained, and how patches are accepted. Introduce yourself to each of the two communities (one for each of the two software projects you have selected). Blog about your findings. | ||
+ | |||
+ | == Week 7 == | ||
+ | * Project Work | ||
+ | |||
+ | == Week 8 == | ||
+ | * Project Work ([[User:Chris Tyler|Chris Tyler]] is at [http://www.linaro.org/connect-lca14 Linaro Connect]) this week. | ||
+ | * Aim at getting your code changes upstream to your communities | ||
+ | |||
+ | == Week 9 == | ||
+ | === Tuesday (March 11) === | ||
+ | * Status updates | ||
+ | * Update from Linaro Connect | ||
+ | * Discussion of useful tools | ||
+ | ** screen | ||
+ | ** time | ||
+ | |||
+ | === Friday (March 14) === | ||
+ | * Comparison of Emulation | ||
+ | ** QEMU | ||
+ | ** Fast Model and Foundation Model | ||
+ | * Install and configure the Foundation Model | ||
+ | ** [[:fedora:Architectures/ARM/AArch64/QuickStart|Fedora AArch64 Quick Start]] | ||
+ | ** [http://www.linaro.org/engineering/engineering-projects/armv8 Linaro Foundation Model Instructions] | ||
+ | * Baseline Benchmarking | ||
+ | |||
+ | ==== Resources ==== | ||
+ | * Foundation Model | ||
+ | ** [http://www.arm.com/products/tools/models/fast-models/ ARM Fast Models] - Note that "fast" here refers to the modelling approach, not execution speed! | ||
+ | * Benchmarking | ||
+ | ** [http://www.tokutek.com/wp-content/uploads/2013/05/20130424-percona-live-benchmarking.pdf Benchmarking Talk by Tim Callaghan] | ||
+ | |||
+ | === Week 9 Deliverables === | ||
+ | * Set up the Foundation Model | ||
+ | * Upstream your proposed code changes | ||
+ | * Blog about your work | ||
+ | |||
+ | == Week 10 == | ||
+ | |||
+ | === Tuesday (March 18) === | ||
+ | * Profiling with <code>gprof</code> | ||
+ | ** Build with profiling enabled (<code>-pg</code>) | ||
+ | ** Run the profile-enabled executable | ||
+ | ** Analyze the data in the <code>gmon.out</code> file | ||
+ | *** <code>gprof ''nameOfBinary''</code> # Displays text profile including call graph | ||
+ | *** <code>gprof ''nameOfBinary'' | gprof2dot | dot | display -</code> # Displays visualization of call graph | ||
+ | |||
+ | Resources | ||
+ | * [https://sourceware.org/binutils/docs-2.16/gprof/ GProf Manual] | ||
+ | * [http://www.thegeekstuff.com/2012/08/gprof-tutorial/ Profiling with GProf] | ||
+ | |||
+ | === Friday (March 21) === | ||
+ | * Gather baseline statistics for your software | ||
+ | |||
+ | === Week 10 Deliverables === | ||
+ | * Blog your baseline benchmark results |
Latest revision as of 00:00, 5 September 2014
Contents
Summary Table
This is a summary/index table. Please follow the links in each cell for additional detail -- especially for the Deliverables column.
Evaluation
Category | Percentage | Evaluation Dates |
---|---|---|
Communication | 20% | Jan 31, Feb 28, March 31, April 13 |
Quizzes | 10% | May be held during any class. A minimum of 5 one-page quizzes will be given. Lowest 3 scores will not be counted. |
Labs | 10% | See deliverables column above. |
Project work | 60% | Feb 28, March 31, April 13 |
Week 1
Tuesday (Jan 7)
- Introduction to the Problem
- Most software is written in a high-level language which can be compiled into machine code for a specific architecture. However, there is a lot of existing code that contains some architecture-specific code fragments written in Assembly Language.
- Reasons for writing code in Assembly Langauge include:
- Performance
- Atomic Operations
- Direct access to hardware features, e.g., CPUID registers
- Most of the historical reasons for including assembler are no longer valid. Modern compilers can out-perform most hand-optimized assembly code, atomic operations can be handled by libraries or compiler intrinsics, and most hardware access should be performed through the operating system or appropriate libraries.
- A new architecture has appeared: Aarch64, which is part of ARMv8. This is the first new computer architecture to appear in several years.
- There are over 1400 software packages/modules present in GNU Linux systems which contain architecture-specific assembly language code. Most of these packages cannot be built on Aarch64 systems without modification.
- In this course, you will:
- Select two software packages from a list compiled by Steve Macintyre of Linaro. Each of the packages on this list contains assembly language code which is platform-specific.
- Prepare a fix/patch for the software so that it will run on 64-bit ARM systems (aarch64). This may be done at either of two levels:
- Port - Add additional assembly language code for aarch64 (basic solution).
- Make Portable - Remove architecture-specific code, replacing it with compiler intrinsics or high-level code so that the software will successfully build on multiple platforms.
- Benchmark - Prove that your changes do not cause a performance regression on existing platforms, and that (ideally) it improves performance.
- Upstream your Code - Submitting your code to the upstream (originating) software project so that it can be incorporated into future versions of the software. This will involve going through a code review to ensure that your code is compatible with and acceptable to the upstream community.
- Optional: You can participate in the Linaro Code Porting/Optimization contest. For details, see the YouTube video of Jon "maddog" Hall and Steve Mcintyre at Linaro Connect USA 2013.
- Course details:
- Course resources are linked from the CDOT wiki, starting at http://zenit.senecac.on.ca/wiki/index.php/SPO600 (Quick find: This page will usually be Google's top result for a search on "SPO600").
- Coursework is submitted by blogging.
- Quizzes will be short (1 page) and will be held without announcement at any time. Your lowest three quiz scores will not be counted, so do not worry if you miss one or two.
- Course marks:
- 60% - Project Deliverables
- 20% - Communication (Blog and Wiki writing)
- 20% - Labs and Quizzes
- Friday classes will be held in an "Active Learning Classroom". You are encouraged to bring your own laptop to these classes.
- For more course information, refer to the SPO600 Weekly Schedule (this page), the Course Outline, and SPO600 Course Policies.
Friday (Jan 10)
- SPO600 Code Review Lab
- Start thinking about how you want to set up your SPO600 Software
Week 1 Deliverables
- Set up a blog and add it to Planet CDOT.
- Blog your conclusion to the SPO600 Code Review Lab.
- Add yourself to the Winter 2014 SPO600 Participants page (leave the projects columns blank).
- Sign and return the Open Source Professional Option Student Agreement.
Week 2
Tuesday (Jan 14)
- Computer Architecture (see also the Computer Architecture Category)
Friday (Jan 17)
Week 2 Deliverables
- Blog your conclusion to the SPO600 Compiled C Lab
- Set up a Fedora 20 system
Week 3
Tuesday (Jan 21)
Friday (Jan 24)
- Background information: SPO600 aarch64 QEMU on Ireland
- SPO600 Assembler Lab
Week 3 Deliverables
- Blog your conclusion to the SPO600 Assembler Lab
Week 4
Tuesday (Jan 28)
Friday (Jan 31)
Week 4 Deliverables
- Reminder: Week 1-3 blog posts are due for marking on Friday, January 31.
- Blog about the Codebase Analysis Lab
Week 5
Tuesday (Feb 4)
Platform-specific code is often utilized for Memory Barriers and Atomics Operations.
Memory Barriers
Memory Barriers ensure that memory accesses are sequenced so that multiple threads, processes, cores, or IO devices see a predictable view of memory.
- Leif Lindholm provides an excellent explanation of memory barriers.
- Blog series - I recommend this series, especially the introduction, as a very clear explanation of memory barrier issues.
- Presentation at Embedded Linux Conference 2010 (Note: Acquire/Release in C++11 and ARMv8 aarch64 appeared after this presentation):
- Memory Barriers - A Hardware View for Software Hackers - This is a highly-rated paper that explains memory barrier issues - as the title suggests, it is designed to describe the hardware origin of the problem to software developers. Despite the fact that it is an introduction to the topic, it is still very technical.
- ARM Technical Support Knowlege Article - In what situations might I need to insert memory barrier instructions? - Note that there are some additional mechanisms present in ARMv8 aarch64, including Acquire/Release.
- Kernel Documentation on Memory Barriers - discusses the memory barrier issue generally, and the solutions used within the Linux kernel. This is part of the kernel documentation.
- Acquire-Release mechanisms
- MSDN Blog Post with a very clear explanation of Acquire-Release.
- Preshing on Programming post with a good explanation.
- ARMv8 Instruction Set Architecture Manual (ARM InfoCentre registration required) - See the section on Acquire/Release and Load/Store.
Atomics
Atomics are operations which must be completed in a single step (or appear to be completed in a single step) without potential interruption.
- Wikipedia has a good basic overview of the need for atomicity in the article on Linerarizability
- GCC provides intrinsics (built-in functions) for atomic operations, as documented in the GCC manual:
- The Fedora project has some guidelines/recommendations for the use of these GCC builtins:
Friday (Feb 7)
Hack Session: Potential Project Analysis
Select a project from the Winter 2014 SPO600 Software List and perform these steps:
- Edit that page to put your name in the "Claimed by" column.
- Investigate the package to determine:
- If the current version has been built for ARM (e.g., exists in the Fedora aarch64 port - fastest way to test is to use 'yum' inside the arm64 emulation environment on Ireland)
- What the platform-specific code in the software does
- Whether portable work-arounds exist
- The need for an aarch64 port or for platform-specific code elimination
- Opportunities for optimization
- The amount of work involved in porting and optimizing, and your skills for performing that work
- Based on the result of your investigation, decide on your interest in the project.
- If you wish to choose this project for yourself, place it on your row in the Participants page.
- If you do not wish to choose this project, remove your name from the "Claimed by" column in the Software List page.
- Repeat until you have two packages.
Week 5 Deliverables
- Blog about your two selected projects, including your detailed initial analysis of them.
- You may want to break this into a couple of posts - e.g., post about your first package while you're working on your second.
- Feel free to also blog about why you did not choose particular packages, too.
Week 6
Tuesday (Feb 11)
- Architecture-specific code for Performance
- Sometimes assembler is used in a C/C++ program for performance. However, modern versions of C/C++ (such as C++11) and recent compilers provide portable ways of accessing high-performance processor capabilities, such as Single Instruction/Multiple Data (SIMD) instructions (called "marketing names" such as SSE, Neon, MMX, 3DNow, or AltaVec on various processors).
- Linaro enginener Matthew Gretton-Dann gave a good presentation on Porting and Optimizing Code for aarch64. The vectorization portion, beginning at 28:10, provides a good introduction to SIMD and autovectorization using GCC on aarch64 (Note that the earlier portion of the presentation includes good information about Atomics).
- YouTube Video direct link
- Slides direct link
- Note that in the presentation above, Matthew takes the code beyond portability without straying into assembler (e.g., using compiler-specific, architecture-specific intrinsics). It is possible to achieve almost all of the performance gains without becoming arch-specific, and most of those can be attained without becoming compiler-specific as well.
- For full details on the SIMD instructions in aarch64, refer to the ARMv8 Instruction Set Overview, particularly section 5.7.
Week 6 Deliverables
- Complete your analysis of your two selected software projects (if you haven't already) - see Week 5. Blog in detail about your findings.
- Identify the upstream communities that develop and maintain the software you have selected to work on. Figure out how they are structured, how they communicate, how code is maintained, and how patches are accepted. Introduce yourself to each of the two communities (one for each of the two software projects you have selected). Blog about your findings.
Week 7
- Project Work
Week 8
- Project Work (Chris Tyler is at Linaro Connect) this week.
- Aim at getting your code changes upstream to your communities
Week 9
Tuesday (March 11)
- Status updates
- Update from Linaro Connect
- Discussion of useful tools
- screen
- time
Friday (March 14)
- Comparison of Emulation
- QEMU
- Fast Model and Foundation Model
- Install and configure the Foundation Model
- Baseline Benchmarking
Resources
- Foundation Model
- ARM Fast Models - Note that "fast" here refers to the modelling approach, not execution speed!
- Benchmarking
Week 9 Deliverables
- Set up the Foundation Model
- Upstream your proposed code changes
- Blog about your work
Week 10
Tuesday (March 18)
- Profiling with
gprof
- Build with profiling enabled (
-pg
) - Run the profile-enabled executable
- Analyze the data in the
gmon.out
file-
gprof nameOfBinary
# Displays text profile including call graph -
gprof nameOfBinary | gprof2dot | dot | display -
# Displays visualization of call graph
-
- Build with profiling enabled (
Resources
Friday (March 21)
- Gather baseline statistics for your software
Week 10 Deliverables
- Blog your baseline benchmark results