1,885
edits
Changes
no edit summary
[[Category:Fall 2019 Winter 2020 SPO600]]
This is the schedule and main index page for the [[SPO600]] ''Software Portability and Optimization'' course for Winter 2020.
<!-- {{Admon/important|It's Alive!|This [[SPO600]] weekly schedule will be updated as the course proceeds - dates and content are subject to change. The cells in the summary table will be linked to relevant resources and labs as the course progresses.}}-->
<!-- {{Admon/important|Content being Updated|This page is in the process of being updated from a previous semester's content. It is not yet updated for the current semester. Do not rely on the accuracy of this information until this warning is removed.}} -->
== Schedule Summary Table ==
|1||Jan 06||[[#Week 1 - Class I|Introduction to the Course / Introduction to the Problem / How is code accepted into an open source project? (Homework: Lab 1)]]||[[#Week 1 - Class II|Computer architecture basics / Binary Representation of Data / Introduction to Assembly Language]]||[[#Week 1 Deliverables|Set up for the course]]
|-
|2||Jan 13||[[#Week 2 - Class I|6502 Assembly Basics Lab (Lab 2)]]||[[#Week 2 - Class II|Math, Assembly language conventions , and Examples / Project Selection]]||[[#Week 2 Deliverables|Lab 1 and 2]]
|-
|3||Jan 20||[[#Week 3 - Class I|6502 Strings and Memory Math Lab (Lab 3)]]||[[#Week 3 - Class II|Computer math / Building CodeAddressing Modes]]||[[#Week 3 Deliverables|Lab 3, Selected Project]]
|-
|4||Jan 27||[[#Week 4 - Class I|6502 Math Continue with Lab (Lab 4)3]]||[[#Week 4 - Class II|System routines / Sysadmin for DevsBuilding Code]]||[[#Week 4 Deliverables|Lab 4, Project Build3]]
|-
|5||Feb 03||[[#Week 5 - Class I|6502 Application String Lab (Lab 54)]]||[[#Week 5 - Class II|Introduction to x86_64 and AArch64 architectures]]||[[#Week 5 Deliverables|Lab 54]]
|-
|6||Feb 10||[[#Week 6 - Class I|64-bit Assembly Language 6502 String Lab (Lab 64)Continued]]||[[#Week 6 - Class II|Profilingx86_64 and AArch64 Assembly]]||[[#Week 6 Deliverables|Lab 64]]
|-
|7||Feb 17||[[style="background:#Week 7 - Class If0f0ff"|Profiling Lab (Lab 7)]]Family Day Holiday||[[#Week 7 - Class II|Project Stage 164-bit Assembly Language Lab (Lab 5)]]||[[#Week 7 Deliverables|Lab 7, Project Profiling5]]
|-
|Reading||Feb 24||style="background: #f0f0ff" colspan="5" align="center"|Reading Week
|-
|8||Mar 02||[[#Week 8 - Class I|Algorithm Selection Lab (Lab 8)5 Continued]]||[[#Week 8 - Class II|SIMD and VectorizationProjects / Changing an Algorithm]]||[[#Week 8 Deliverables|Lab 5, Project Stage 1 dueBlogs]]
|-
|9||Mar 09||[[#Week 9 - Class I|SIMD Algorithm Selection Lab (Lab 96)]]||[[#Week 9 - Class II|Identifying Optimization OpportunitiesCompiler Optimizations / SIMD and Vectorization]]||[[#Week 9 Deliverables|Lab 96]]
|-
|10Switchover||Mar 16||[[style="background: #Week 10 - Class I|Atomics]]||[[#Week 10 - Class IIf0f0ff" colspan="5" align="center"|Project Discussion]]||[[#Online Switchover Week 10 Deliverables|Project Work]]
|-
|1110||Mar 23||[[#Week 11 10 - Class I|Online Startup / Project HackingStage 1]]||[[#Week 11 10 - Class II|Memory Ordering, Synchronization, and BarriersReview for Stage 1]]||[[#Week 11 10 Deliverables|Project Stage 2 dueBlogging]]
|-
|1211||Mar 30||[[#Week 12 11 - Class I|ifunc<span style="background: #ffff00;">Quiz</span> / Profiling]]||[[#Week 12 11 - Class II|Emerging directions in system architectureSIMD Part 1 - Autovectorization]]||[[#Week 12 11 Deliverables|Project WorkStage 1 due April 1, 11:59 pm / Blog about your project as you start Stage 2]]
|-
|12||Apr 06||[[#Week 12 - Class I|SIMD Part 2 - Intrinsics and Inline Assembler]]||style="background:#f0f0ff"|Good Friday Holiday||[[#Week 12 Deliverables|Project Stage 2 due]]|-|13||Apr 0613||[[#Week 13 - Class I|Demos<span style="background: #ffff00;">Quiz</span> / Project Discussion]]||[[#Week 13 - Class II|Course wrapWrap-up discussionDiscussion]]||[[#Week 13 Deliverables|Project Stage 3 dueMonday, April 20, 11:59 pm (Firm!)]]
|-
|}
* Fixed-point
** A fixed-point value is encoded the same as an integer, except that some of the bits are fractional -- they're considered to be to the right of the "binary point" (binary version of "decimal point" - or more generically, the ''radix point''). For example, binary 000001.00 is decimal 1.0, and 000001.11 is decimal 1.75.
** An alternative to fixed-point values is integer values in a smaller unit of measurement. For example, some accounting software may use integer values representing cents. For input and display purposes, dollar-and-cent values are converted to/from cent values.
* Floating-point
** Floating point numbers have three parts: a ''sign bit'' (0 for positive, 1 for negative), a ''mantissa'' or ''significand'', and an ''exponent''. The value is interpreted as <code>''sign'' mantissa * 2<sup>exponent</sup></code>.
=== Week 2 - Class I ===
* [[6502 Assembly Language Lab]](Lab 2)
=== Week 2 - Class II ===
* 6502 Assembly Language Continued** [[6502 Math]]** Assembly conventions and examples*** Directives**** define**** DCB === Week 2 Deliverables ===* Blog your results to [[SPO600 Code Review Lab|Lab 1]] and [[6502 Assembly Language Lab|Lab 2]]. == Week 3 == === Week 3 - Class I ===* Finish [[6502 Assembly Language Math Lab|Lab 2]]* [[6502 Assembly Language Math Lab]] (Lab 3) === Week 3 - Class II ===* [[6502 Addressing Modes]]* Project selectionSelection - what to look for === Week 3 Deliverables ===* Blog your Lab 3 results (or interim results). == Week 4 == === Week 4 - Class I ===* Complete [[6502 Assembly Language Math Lab|Lab 3]] === Week 4 - Class II ======= Strings and System Routines ====* The [[6502 Emulator|6502 emulator]] has a 80x25 character display mapped starting at location '''$f000'''. Writing to a byte to screen memory will cause that character to be displayed at the corresponding location on the screen, if the character is printable. If the high bit is set, the character will be displayed in <span style="background:black;color:white;"> reverse video </span>. For example, storing the ASCII code for "A" (which is 65 or $41) into memory location $f000 will display the letter "A" as the first character on the screen; ORing the value with 128 ($80) yields a value of 193 or $d1, and storing that value into $f000 will display <span style="background:black;color:white;">A</span> as the first character on the screen.* A "ROM chip" with screen control routines is mapped into the emulator at the end of the memory space (at the time of writing, the current version of the ROM exists in pages $fe and $ff). Details of the available ROM routines can be viewed using the "Notes" button in the emulator or on the [[6502_Emulator#ROM_Routines|emulator page]] on this wiki.* Strings in assembler are stored as sequences of bytes. As is usually the case in assembler, memory management is left to the programmer. You can terminate strings with null bytes (C-style), which are easy to detect one some CPUs (e.g., <code>lda</code> followed by <code>bne / beq</code> on a 6502), or you can use character counts to track string lengths. ==== Building Code ====* C code is built with the C compiler, typically called <code>cc</code> (which is usually an alias for a specific C compiler, such as <code>gcc</code>, <code>clang</code>, or <code>bcc</code>).* The C compiler runs through five steps, often by calling separate executables:*# Preprocessing - performed by the C Preprocessor (<code>cpp</code>), this step handles directives such as <code>#include</code>, <code>#define</code>, and <code>#ifdef</code> to build produce a single source code text file, with cross-references to the original input files so that error messages can be displayed correctly (e.g., an error in an included file can be correctly reported by filename and line number).*# Compilation - the C source code is converted to assembler, going through one or more intermedie representations (IR) such as [https://gcc.gnu.org/onlinedocs/gccint/GENERIC.html GENERIC] or [https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html GIMPLE], or [https://llvm.org/docs/LangRef.html LLVM IR]. The program used for this step is often called <code>cc1</code>.*# Optimization - various optimization passes are performed at different stages of processing through multiple passes, but centered on IR at the compilation step. Sometimes, the work of a previous pass is undone by a later pass: for example, a complex loop may be converted into a series of simpler loops by an early pass, in the hope that optimizations can be applied to one or more of the simpler loops; the loops may later be recombined to single loop if no optimizations are found that are applicable to the simplified loops.*# Assembly - converts the assembly language code emitted by the compilation stage into binary object code.*# Linking - connects code to functions (aka methods or procedures) which were compiled in other ''compilation units'' (they may be pre-compiled libraries available on the system, or they may be other pieces of the same code base which are compiled in separate steps). Linking may be static, where libraries are imported into the binary executable file of the output program, or linking may be dynamic, where additional information is added to the binary executable file so that a run-time linker can load and connect libraries at runtime.* Other languages which are compiled to binary form, such as C++, Ocaml, Haskell, Fortran, and COBOL go through similar processing. Languages which do not compile to binary form are either compiled to a ''bytecode'' format (binary code that doesn't correspond to actual hardware), or left in original source format, and an interpreter reads and executes the bytecode or source code at runtime. Java and Python use bytecode; Bash and JavaScript interpret source code. Some interpreters build and cache blocks of machine code on-the-fly; this is called Just-in-Time (JIT) compilation. * Compiler feature flags control the operation of the compiler on the source code, including optimiation passes. When using gcc, these "feature flags" take the form <code>-f[no-]'''featureName'''</code> -- for example:** <code>-fbuiltin</code> -- enables the "builtin" feature** <code>-fno-builtin</code> -- disables the "builtin" feature* Feature flags can be selected in groups using the optimization (<code>-O</code>) level:** <code>-O0</code> -- disables most (but not all) optimizations** <code>-O1</code> -- enables basic optimizations that can be performed quickly** <code>-O2</code> -- enables almost all safe operatimizations** <code>-O3</code> -- enables agressive optimization, including optimizations which may not always be safe for all code (e.g., assuming +0 == -0)** <code>-Os</code> -- optimzies for small binary and runtime size, possibly at the expense of speed** <code>-Ofast</code> -- optimizes for fast execution, possibly at the expense of size** <code>-Og</code> -- optimizes for debugging: applies optimizations that can be performed quickly, and avoids optimizations which convolute the code* To see the optimizations which are applied at each level in gcc, use: <code>gcc -Q --help=optimizers -O'''level'''</code> -- it's interesting to compare the output of different levels, such as <code>-O0</code> and <code>-O3</code> * Different CPUs in one family can have different capabilities and performance characteristics. The compiler options <code>-march</code> sets the overall architecture family and CPU features to be targetted, and the <code>-mtune</code> option sets the specific target for tuning. Thus, you can produce an executable that will work across a range of CPUs, but is specifically tuned to perform best on a certain model. For example, <code>-march=ivybridge -mtune=knl</code> will cause the processor to use features which are present on all Intel Ivy Bridge (and later) processors, but tuned for optimal performance on Knight's Landing processors. Similarly, <code>-march=armv8-a -mtune=cortex-a72</code> will cause the compiler to emit code which will safely run on any ARMv8-a processor, but be tuned specifically for the Cortex-A72 core. * When building code on different platforms, there are a lot of variables which may need to be fed into the preprocessor, compiler, and linker. These can be manually specified, or they can be automatically determined by a tool such as GNU Autotools (typically visible as the <code>configure</code> script in the source code archive).* The source code for large projects is divided into many source files for manageability. The dependencies between these files can become complex. When developing or debugging the software, it is often necessary to make changes in one or a small number of files, and it may not be necessary to rebuild the entire project from scratch. The [[Make_and_Makefiles|<code>make</code>]] utility is used to script a build and to enable rapid partial rebuilds after a change to source code files (see [[Make and Makefiles]]). * Many open source projects distribute code as a source archive ("tarball") which usually decompresses to a subdirectory '''packageName-version''' (e.g. foolib-1.5.29). This will typically contain a script which configures the Makefile (<code>configure</code> if using GNU Autotools). After running this script, a Makefile will be available, which can be used to build the software. However, some projects use an alternative configuration tool instead of GNU Autotools, and some may use an alternate build system instead of make.* To eliminate this variation, most Linux distributions use a '''package''' system, which standardizes the build process and which produces installable package files which can be used to reliably install software into standard locations with automatic dependency resolution, package tracking via a database, and simple updating capability. For example, Fedora, Red Hat Enterprise Linux, CentOS, SuSE, and OpenSuSE all use the RPM package system, in which source code is bundled with a build recipe in a "Source RPM" (SRPM), which can be built with a single command into a binary package (RPM). The RPMs can then be downloaded, have dependencies and conflicts resolved, and installed with a single command such as <code>dnf</code>. The fact that the SRPM can be built into an installable RPM through an automated process enables and simplifies automated build systems, mass rebuilds, and architecture-specific builds. === Week 4 Deliverables ===* Blog your [[6502 Assembly Language Math Lab|Lab 3]] results.* Blogs are due at the end of the month (Feb 2 - 11:59 pm), so proofread your posts, ensure that you have at least 1-2 per week, and make sure the link from the [[Winter 2020 SPO600 Participants|Participant's Page]] is accurate. Feel free to write multiple posts about one topic or lab, if appropriate. == Week 5 == === Week 5 - Class I ===* [[6502 Assembly Language String Lab]] (Lab 4) === Week 5 - Class II ===* Introduction to x86_64 and ARMv8a/AArch64 [[Computer Architecture|Architectures]]** [[ARMv8]] Architecture*** [[AArch64 Register and Instruction Quick Start]]** x86_64 Architecture*** [[x86_64 Register and Instruction Quick Start]]* Working with [[ELF|ELF Files]]* Compiler Options: a [[SPO600 Compiled C Lab|demo]]** -static** -g** -fno-builtin** -O0 vs -O3* Building an Open Source Project's Code** [[Make and Makefiles]] revisited** A brief introduction to GNU Autotools === Week 5 Deliverables ===* [[6502 Assembly Language String Lab|Lab 4 Results]] == Week 6 == === Week 6 - Class I ===* [[6502 Assembly Language String Lab]] (Lab 4) - Continued === Week 6 - Class II ===* x86_64 and AArch64 Assembler** See [[Assembler Basics]] === Week 6 Deliverables ===* Blog your [[6502 Assembly Language String Lab|Lab 4]] results == Week 7 == === Week 7 - Class I ===* No class - Family Day Holiday === Week 7 - Class II ===* [[SPO600 64-bit Assembler Lab]] (Lab 5) === Week 7 Deliverables ===* Blog your [[SPO600 64-bit Assembler Lab|Lab 5]] results == Week 8 == === Week 8 - Class I ===* [[SPO600 64-bit Assembler Lab|Lab 5]] Continued === Week 8 - Class II ===* [[Winter 2020 SPO600 Project|Course Projects]]** Building software** Benchmarking * Changing an Algorithm to Improve Performance** Audio volume scaling problem*** PCM Audio is represented as 16-bit signed integer samples*** To reduce the volume of the audio, it can be scaled by a factor from 0.000 (silence) to 1.000 (original volume).*** This is a common operation on mobile and multimedia devices.*** What is the best way to do this?** Approach 1: Naive Implementation - Multiply each sample by the scaling factor (this involves multiplying each integer sample by a floating-point number, then converting the result back to an integer)** Approach 2: Lookup Table - Pre-calculate all possible values multiplied by the scaling factor, then look up the new value for each original sample value** Approach 3: Fixed-point math - Use fixed-point math rather than floating-point math** Approach 4: Vector fixed-point math - Use SIMD instructions to do multiple fixed-point operations in parallel === Week 8 Deliverables ===* Blog your [[SPO600 64-bit Assembler Lab|Lab 5]] results.* '''Reminder:''' Blogs are due for February this Sunday (March 8, 11:59 pm). == Week 9 == === Week 9 - Class I ===* [[SPO600 Algorithm Selection Lab]] (Lab 6) === Week 9 - Class II ===[[Winter 2020 SPO600 Project|project]]* [[Compiler Optimizations]]* SIMD and Vectorization** [[SPO600 Vectorization Lab|Optional vectorization lab]] === Week 9 - Deliverables ===* Blog about [[SPO600 Algorithm Selection Lab|Lab 6]] and your Project == Week 10 == === Week 10 - Class I ===* [https://youtu.be/DCp8oghdTfU Video - March 23]* Focus this week: Complete Stage 1 of your [[Winter 2020 SPO600 Project|Course Projects]] === Drop-in Online Discussion Sessions ===* Tuesday to Friday (March 24-27) from 9-10 AM* Online at https://whereby.com/ctyler** There is a maximum of 12 people in the room at a time. I recommend dropping by one or twice a week with your questions.** If 9-10 am cannot work for you, email me to discuss this. === Week 10 - Class II ===* [https://youtu.be/M-5IizEwkfY Video - March 27: Review of material for Stage 1]* Stage 1 due date '''extended''' to Wednesday, April 1, 11:59 pm === Week 10 - Deliverables ===* Blog about your [[Winter 2020 SPO600 Project|project]]. Project Stage 1 is due next Wednesday. == Week 11 == === Week 11 - Class I ===* Quiz #4 - Online in Blackboard* '''Optional video:''' [https://youtu.be/CdyERanIxmI Building Software] - This video provides a review of building an open-source software package from either a source archive (zip or tarball) or from a code repository (such as a <code>git</code> repository).* [https://youtu.be/Hip1KtYZKE0 Video - March 30: Profiling Software]** Profiling with <code>gprof</code> and <code>perf</code> === Week 11 - Class II ===* [https://youtu.be/EIPbufXhiQs Video - April 3: SIMD and Auto-vectorization]* SIMD-Autovectorization Resources** [https://gcc.gnu.org/projects/tree-ssa/vectorization.html Auto-Vectorization in GCC] - Main project page for the GCC auto-vectorizer.** [http://locklessinc.com/articles/vectorize/ Auto-vectorization with gcc 4.7] - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. '''This article is strongly recommended.'''** [https://software.intel.com/sites/default/files/8c/a9/CompilerAutovectorizationGuide.pdf Intel (Auto)Vectorization Tutorial] - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm === Week 11 Deliverables ===* [[Winter 2020 SPO600 Project|Project Stage 1] due Wednesday, April 1 (yes, really) at 11:59 pm* Blog about your project as you continue into Stage 2** March posts are due on Monday, April 6 at 11:59 pm. == Week 12 == === Week 12 - Class I ===* [https://youtu.be/76rtxPozPJI Video - April 6: SIMD, Inline Assembler, and Compiler Intrinsics]** [[Inline Assembly Language]]** [[Compiler Intrinsics]] * Retired SPO600 Labs - These labs are not being used this semester but may be useful for reference. The software in these labs was used in the video for this week.** [[SPO600 SIMD Lab]]** [[SPO600 Inline Assembler Lab]] === Week 12 - Class II ===* No class - Good Friday === Resources ======= Auto-vectorization ====* [https://gcc.gnu.org/projects/tree-ssa/vectorization.html Auto-Vectorization in GCC] - Main project page for the GCC auto-vectorizer.* [http://locklessinc.com/articles/vectorize/ Auto-vectorization with gcc 4.7] - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. '''This article is strongly recommended.'''* [https://software.intel.com/sites/default/files/8c/a9/CompilerAutovectorizationGuide.pdf Intel (Auto)Vectorization Tutorial] - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm==== Inline Assembly Language ====* [[Inline Assembly Language]]* [http://developer.arm.com ARM Developer Information Centre]** [https://developer.arm.com/products/architecture/a-profile/docs/den0024/a ARM Cortex-A Series Programmer’s Guide for ARMv8-A]* The ''short'' guide to the ARMv8 instruction set: [https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf ARMv8 Instruction Set Overview] ("ARM ISA Overview")* The ''long'' guide to the ARMv8 instruction set: [https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile] ("ARM ARM")==== C Intrinsics - AArch64 SIMD ====* [https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics ARM NEON Intrinsics Reference]* [https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html GCC ARM C Language Extensions] == Week 13 == === Week 13 - Class I ===* [https://youtu.be/GLzAVWW8dEo Video - April 16: Project Stage 3] === Week 13 - Class II ===* Wrap-up Session
<BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/><BR/>