Changes

← Older edit

Fall 2021 SPO600 Weekly Schedule

14,757 bytes added, 10:29, 29 January 2022

no edit summary

{{Admon/important|It's Alive!|This [[SPO600]] weekly schedule will be updated as the course proceeds - dates and content are subject to change. The cells in the summary table will be linked to relevant resources and labs as the course progresses.}}

~~~~

== Schedule Summary Table ==

|2||Sep 13||[[#Week 2 - Class I|Binary Representation of Data]]||[[#Week 2 - Class II|Computer architecture basics / Introduction to Assembly Language / 6502 Assembly Basics Lab (Lab 2)]]||[[#Week 2 Deliverables|Lab 2]]

|-

|3||Sep 20||[[#Week 3 - Class I|Math, Assembly language conventions, and Examples]]||[[#Week 3 - Class II|~~6502 Math Lab (Lab 3)~~Jumps, Branches, and Subroutines]]||[[#Week 3 Deliverables|Lab 3]]

|-

|4||Sep 27||[[#Week 4 - Class I|~~Addressing Modes~~Characters, Strings, and Services]]||[[#Week 4 - Class II|6502 ~~Assembler~~ Math Lab (~~Cont'd~~Lab 3)]]||[[#Week 4 Deliverables|~~September Blog Posts~~Labs]]

|-

|5||Oct 4||[[#Week 5 - Class I|~~System Routines / Strings /~~ Building Code]]||[[#Week 5 - Class II|6502 String Lab (Lab 4)]]||[[#Week 5 Deliverables|Lab 4]]

|-

|6||Oct 12||style="background:#f0f0ff"~~|[[#Week 6 - Class I~~|Thanksgiving Holiday]]||[[#Week 6 - Class II|~~x86_64 and AArch64 Assembly~~6502 Labs, Continued]]||[[#Week 6 Deliverables|~~Lab 5~~Blog Posts]]

|-

|7||Oct 18||[[#Week 7 - Class I|x86_64 and AArch64 ~~Assembly / Compiler Optimizations / Project Selection~~]]||[[#Week 7 - Class II|~~Compilation Lab (Lab 6)~~Compiler Optimizations]]||[[#Week 7 Deliverables|~~Lab 6~~Blog Posts]]

|-

|Reading||Oct 25||style="background: #f0f0ff" colspan="5" align="center"|Reading Week

|-

|8||Nov 1||[[#Week 8 - Class I|~~Profiling~~x86_64 and AArch64]]||[[#Week 8 - Class II|~~Profiling Lab (Lab 7)~~x86_64 and AArch64]]||[[#Week 8 Deliverables|October Blog Posts / Project Stage 1]]

|-

|9||Nov 8||[[#Week 9 - Class I|~~Optimization through Algorithm Selection~~Benchmarking and Profiling]]||[[#Week 9 - Class II|~~Compilation Lab (~~64-Bit Assembler Lab 6)]]||[[#Week 9 Deliverables|Lab 65]]

|-

|10||Nov 15||[[#Week 10 - Class I|~~Single Instruction, Multiple Data (SIMD) and Vectorization~~Continue work on Lab 5]]||[[#Week 10 - Class II|~~SIMD and Vectorization Lab (Lab 8)~~Discussion]]||[[#Week 10 Deliverables|Lab 85]]

|-

|11||Nov 22||[[#Week 11 - Class I|SIMD, Intrinsics and inline Assembler]]||[[#Week 11 - Class II|~~Intrinsics Lab (Lab 9)~~Discussion]]||[[#Week 11 Deliverables|Project Stage 21]]

|-

|12||Nov 29||[[#Week 12 - Class I|~~Project Discussion~~Memory]]||[[#Week 12 - Class II|Project Discussion]]||[[#Week 12 Deliverables|~~Lab 9 / November Blog Posts~~Project work]]

|-

|13||Dec 6||[[#Week 13 - Class I|Project Discussion]]||[[#Week 13 - Class II|Lab 10]]||[[#Week 13 Deliverables|~~Lab 10~~Project Work]]

|-

|14||Dec 13||[[#Week 13 - Class I|Future Directions in Architecture]]||[[#Week 13 - Class II|Wrap-up Discussion]]||[[#Week 13 Deliverables|~~December Blog Posts / Project Stage 3~~Final blog posts and project work]]

|-

|}

===== Benchmarking and Profiling =====

Benchmarking involves testing software performance under controlled conditions so that the performance can be compared to other software, the same software operating on other types of computers, or so that the impact of a change to the software can be gauged.

===== Build Process =====

Building software is a complex task that many developers gloss over. The simple act of compiling a program invokes a process with five or more stages, including pre-proccessing, compiling, optimizing, assembling, and linking. However, a complex software system will have hundreds or even thousands of source files, as well as dozens or hundreds of build configuration options, auto configuration scripts (cmake, autotools), build scripts (such as Makefiles) to coordinate the process, test suites, and more.

==== Introduction to Assembly Language on the 6502 Processor ====

https://web.microsoftstream.com/video/6a645edd-3537-4910-843c-6d32f6678e79

To understand basic assembly/machine language concepts, we're going to start with a very simple processor: the [[6502]].

* Complete and blog about [[6502 Assembly Language Lab|Lab 2]]

== Week 4 ==

=== Week 4 - Class I ===

==== Videos ====

* [https://web.microsoftstream.com/video/66e3eb13-4f7c-462f-8fab-5debc365bbb2 Week 4 Announcements]

* [https://web.microsoftstream.com/video/ac307c3e-10ea-4046-b339-675ac4e6d6c7 6502 Characters, Strings, and Services]

* [https://web.microsoftstream.com/video/6a645edd-3537-4910-843c-6d32f6678e79 A 6502 Assembly Langauge Hack]

==== Characters, Strings, and System Routines ====

* Characters are encoded in binary by numeric reference to a character in a character set. For example, characters in ASCII are encoded as 7-bit values in the range 0-127, which refer to characters in the [https://www.man7.org/linux/man-pages/man7/ascii.7.html American Standard Code for Information Interchange] character set: code 32 represents a space (" "), 48 represents a zero ("0"), code 49 represents a one ("1"), and code 65 represents a capital letter A.

* Strings in assembler are stored as sequences of bytes. As is usually the case in assembler, memory management is left to the programmer. You can terminate strings with null bytes (C-style), which are easy to detect one some CPUs (e.g., <code>lda</code> followed by <code>bne / beq</code> on a 6502), or you can use character counts to track string lengths.

* The [[6502 Emulator|6502 emulator]] has a 80x25 character display mapped starting at location '''$f000'''. Writing to a byte to screen memory will cause that character to be displayed at the corresponding location on the screen, if the character is printable. If the high bit is set, the character will be displayed in  reverse video . For example, storing the ASCII code for "A" (which is 65 or $41) into memory location $f000 will display the letter "A" as the first character on the screen; ORing the value with 128 ($80) yields a value of 193 or $d1, and storing that value into $f000 will display A as the first character on the screen.

* A "ROM chip" with screen control routines is mapped into the emulator at the end of the memory space (at the time of writing, the current version of the ROM exists in pages $fe and $ff). Details of the available ROM routines can be viewed using the "Notes" button in the emulator or on the [[6502_Emulator#ROM_Routines|emulator page]] on this wiki.

** To write to the screen, you can use the SCINIT (SCreen INITialize) routine followed by calls to the CHROUT (CHaRacter OUTput) routine. CHROUT handles things like cursor movement, newline ($0d) characters, and screen scrolling.

=== Week 4 - Class II ===

* [[6502 Assembly Language Math Lab]] (Lab 3)

=== Week 4 Deliverables ===

* Blog about your [[6502 Assembly Language Math Lab|Lab 3]]

== Week 5 ==

=== Week 5 - Class I ===

=== Videos ===

* [https://web.microsoftstream.com/video/d46f4457-70a1-4163-8465-3b5640495449 Compiling]

* [https://web.microsoftstream.com/video/5f3a96e8-5d01-412a-9922-3797ba824fe1 Building Software]

* [https://web.microsoftstream.com/video/466bc3a1-6729-434d-b51f-33d4fbb145c5 Make and Makefiles]

==== Building Code ====

* C code is built with the C compiler, typically called <code>cc</code> (which is usually an alias for a specific C compiler, such as <code>gcc</code>, <code>clang</code>, or <code>bcc</code>).

* The C compiler runs through five steps, often by calling separate executables:

*# Preprocessing - performed by the C Preprocessor (<code>cpp</code>), this step handles directives such as <code>#include</code>, <code>#define</code>, and <code>#ifdef</code> to build produce a single source code text file, with cross-references to the original input files so that error messages can be displayed correctly (e.g., an error in an included file can be correctly reported by filename and line number).

*# Compilation - the C source code is converted to assembler, going through one or more intermedie representations (IR) such as [https://gcc.gnu.org/onlinedocs/gccint/GENERIC.html GENERIC] or [https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html GIMPLE], or [https://llvm.org/docs/LangRef.html LLVM IR]. The program used for this step is often called <code>cc1</code>.

*# Optimization - various optimization passes are performed at different stages of processing through multiple passes, but centered on IR at the compilation step. Sometimes, the work of a previous pass is undone by a later pass: for example, a complex loop may be converted into a series of simpler loops by an early pass, in the hope that optimizations can be applied to one or more of the simpler loops; the loops may later be recombined to single loop if no optimizations are found that are applicable to the simplified loops.

*# Assembly - converts the assembly language code emitted by the compilation stage into binary object code.

*# Linking - connects code to functions (aka methods or procedures) which were compiled in other ''compilation units'' (they may be pre-compiled libraries available on the system, or they may be other pieces of the same code base which are compiled in separate steps). Linking may be static, where libraries are imported into the binary executable file of the output program, or linking may be dynamic, where additional information is added to the binary executable file so that a run-time linker can load and connect libraries at runtime.

* Other languages which are compiled to binary form, such as C++, Ocaml, Haskell, Fortran, and COBOL go through similar processing. Languages which do not compile to binary form are either compiled to a ''bytecode'' format (binary code that doesn't correspond to actual hardware), or left in original source format, and an interpreter reads and executes the bytecode or source code at runtime. Java and Python use bytecode; Bash and JavaScript interpret source code. Some interpreters build and cache blocks of machine code on-the-fly; this is called Just-in-Time (JIT) compilation.

* Compiler feature flags control the operation of the compiler on the source code, including optimiation passes. When using gcc, these "feature flags" take the form <code>-f[no-]'''featureName'''</code> -- for example:

** <code>-fbuiltin</code> -- enables the "builtin" feature

** <code>-fno-builtin</code> -- disables the "builtin" feature

* Feature flags can be selected in groups using the optimization (<code>-O</code>) level:

** <code>-O0</code> -- disables most (but not all) optimizations

** <code>-O1</code> -- enables basic optimizations that can be performed quickly

** <code>-O2</code> -- enables almost all safe operatimizations

** <code>-O3</code> -- enables agressive optimization, including optimizations which may not always be safe for all code (e.g., assuming +0 == -0)

** <code>-Os</code> -- optimzies for small binary and runtime size, possibly at the expense of speed

** <code>-Ofast</code> -- optimizes for fast execution, possibly at the expense of size

** <code>-Og</code> -- optimizes for debugging: applies optimizations that can be performed quickly, and avoids optimizations which convolute the code

* To see the optimizations which are applied at each level in gcc, use: <code>gcc -Q --help=optimizers -O'''level'''</code> -- it's interesting to compare the output of different levels, such as <code>-O0</code> and <code>-O3</code>

* Different CPUs in one family can have different capabilities and performance characteristics. The compiler options <code>-march</code> sets the overall architecture family and CPU features to be targetted, and the <code>-mtune</code> option sets the specific target for tuning. Thus, you can produce an executable that will work across a range of CPUs, but is specifically tuned to perform best on a certain model. For example, <code>-march=ivybridge -mtune=knl</code> will cause the processor to use features which are present on all Intel Ivy Bridge (and later) processors, but tuned for optimal performance on Knight's Landing processors. Similarly, <code>-march=armv8-a -mtune=cortex-a72</code> will cause the compiler to emit code which will safely run on any ARMv8-a processor, but be tuned specifically for the Cortex-A72 core.

* When building code on different platforms, there are a lot of variables which may need to be fed into the preprocessor, compiler, and linker. These can be manually specified, or they can be automatically determined by a tool such as GNU Autotools (typically visible as the <code>configure</code> script in the source code archive).

* The source code for large projects is divided into many source files for manageability. The dependencies between these files can become complex. When developing or debugging the software, it is often necessary to make changes in one or a small number of files, and it may not be necessary to rebuild the entire project from scratch. The [[Make_and_Makefiles|<code>make</code>]] utility is used to script a build and to enable rapid partial rebuilds after a change to source code files (see [[Make and Makefiles]]).

* Many open source projects distribute code as a source archive ("tarball") which usually decompresses to a subdirectory '''packageName-version''' (e.g. foolib-1.5.29). This will typically contain a script which configures the Makefile (<code>configure</code> if using GNU Autotools). After running this script, a Makefile will be available, which can be used to build the software. However, some projects use an alternative configuration tool instead of GNU Autotools, and some may use an alternate build system instead of make.

* To eliminate this variation, most Linux distributions use a '''package''' system, which standardizes the build process and which produces installable package files which can be used to reliably install software into standard locations with automatic dependency resolution, package tracking via a database, and simple updating capability. For example, Fedora, Red Hat Enterprise Linux, CentOS, SuSE, and OpenSuSE all use the RPM package system, in which source code is bundled with a build recipe in a "Source RPM" (SRPM), which can be built with a single command into a binary package (RPM). The RPMs can then be downloaded, have dependencies and conflicts resolved, and installed with a single command such as <code>dnf</code>. The fact that the SRPM can be built into an installable RPM through an automated process enables and simplifies automated build systems, mass rebuilds, and architecture-specific builds.

==== Executable Files ====

* Executable code is usually stored in a multi-segment file format containing code, data, linking information (for shared object files/dynamically linked libraries), information on how the segments should be placed in memory, and so forth.

* On a modern Linux system, this is usually the [[Executable and Linkable Format]] (ELF). Other executable format include COFF (UEFI, Unix/Older Linux/Windows) and its derivative PE (Windows). See the [[Executable and Linkable Format]] page for a list of tools that can be used to analyze ELF files.

=== Week 5 - Class III ===

* Continuing on with [[6502 Assembly Language Math Lab|Lab 3]] and then start on the [[6502 Assembly Language String Lab]] (Lab 4)

=== Week 5 Deliverables ===

* Blog about [[6502 Assembly Language Math Lab|Lab 3]] and [[6502 Assembly Language String Lab|Lab 4]]

== Week 6 ==

Monday of Week 6 was the Thanksgiving Holiday (no Class I this week).

=== Week 6 - Class II ===

* Continue with [[6502 Assembly Language Math Lab|Lab 3]] and [[6502 Assembly Language String Lab|Lab 4]]

== Week 7 ==

=== Week 7 - Class I ===

==== Videos ====

* [https://web.microsoftstream.com/video/ba0bc202-6478-41b0-a792-c159a98b3e0a Week 7 Announcements]

Delayed due to technical issues:

* Introduction to x86_64 and AArch64

=== Week 7 - Class II ===

* [[Compiler Optimizations]]

=== Week 7 Deliverables ===

* Submit the [https://forms.office.com/r/3Q3fjWda0K SPO600 Blog and SSH Information Form] (see the [[SPO600 Communication Tools|instructions]])

* Prepare your Blog for marking (by midnight on October 31)

** There should be 1-2 posts per week (i.e., at least 7)

** Blog about your labs and the material in this course, including your reflections

** Follow the [[Blog Guidelines]]

== Week 8 ==

=== Week 8 - Class I ===

==== Video ====

* [https://web.microsoftstream.com/video/4fd6a3f4-53b9-470c-9b22-3d6905b60cb9 x86_64 and AArch64 Architectures]

==== Reminder====

=== Week 8 - Class II ===

* Discussion and demo of x86_64 and AArch64 differences

=== Week 8 Deliverables ===

* Fill out the [https://forms.office.com/r/3Q3fjWda0K Blog and SSH Form] if you haven't already done so.

== Week 9 ==

=== Week 9 - Class I ===

==== Video ====

* [https://web.microsoftstream.com/video/8ca180ab-ad44-4905-8d39-d6c0cea87529 Benchmarking and Profiling]

=== Week 9 - Class II ===

* [[SPO600_64-bit_Assembler_Lab|64-Bit Assembler Lab (Lab 5)]]

== Week 10 ==

=== Week 10 - Class I ===

* No new videos - continue work on [[SPO600_64-bit_Assembler_Lab|Lab 5]]

=== Week 10 - Class II ===

* Quiz 2

* Discussion

== Week 11 ==

=== Week 11 - Class I ===

* Lab 6 / Project Stage 1

==== Videos ====

* [https://web.microsoftstream.com/video/2a82da88-bf5b-4112-953a-7408fbab30c1 Algorithm Selection & Profiling]

* [https://web.microsoftstream.com/video/f60b92c6-9db3-4f57-b0b9-7c35ea0c054f Single Instruction, Multiple Data (SIMD)]

* [https://web.microsoftstream.com/video/d208a737-7777-4b5a-b276-1b19dc78145c Inline Assembler]

* [https://web.microsoftstream.com/video/d56ec6ff-2c2c-40d6-8967-52d829e413cc Linux Tips for Benchmarking and Profiling] (older video)

=== Week 11 - Class II ===

* Quiz 3

* Discussion

* [[SPO600_Algorithm_Selection_Lab|Project Stage I (aka Lab 6)]]

=== Week 11 Deliverables ===

* [[SPO600_Algorithm_Selection_Lab|Blog about Project Stage 1]]

== Week 12 ==

=== Week 12 - Class I ===

==== Videos ====

* [https://web.microsoftstream.com/video/1bcab47b-514a-4f23-bdd4-f73662a0673f Paged Memory Systems] - Virtual memory, demand loading, shared pages, write protection, and challenges benchmarking memory performance (VSS and RSS)

* [https://web.microsoftstream.com/video/880fb0f8-1084-457a-92e0-80f04ad62463 Memory Alignment and Performance]

<!--

=== Week 3 - Class I ===

* Finish [[6502 Assembly Language ~~Math~~ Lab|Lab 2]]

* [[6502 Assembly Language Math Lab]] (Lab 3)

=== Week 4 - Class II ===

==== Characters, Strings , and System Routines ====* Characters are encoded in binary by numeric reference to a character in a character set. For example, characters in ASCII are encoded as 7-bit values in the range 0-127, which refer to characters in the [https://www.man7.org/linux/man-pages/man7/ascii.7.html American Standard Code for Information Interchange] character set: code 32 represents a space (" "), 48 represents a zero ("0"), code 49 represents a one ("1"), and code 65 represents a capital letter A.* Strings in assembler are stored as sequences of bytes. As is usually the case in assembler, memory management is left to the programmer. You can terminate strings with null bytes (C-style), which are easy to detect one some CPUs (e.g., <code>lda</code> followed by <code>bne / beq</code> on a 6502), or you can use character counts to track string lengths.

* The [[6502 Emulator|6502 emulator]] has a 80x25 character display mapped starting at location '''$f000'''. Writing to a byte to screen memory will cause that character to be displayed at the corresponding location on the screen, if the character is printable. If the high bit is set, the character will be displayed in  reverse video . For example, storing the ASCII code for "A" (which is 65 or $41) into memory location $f000 will display the letter "A" as the first character on the screen; ORing the value with 128 ($80) yields a value of 193 or $d1, and storing that value into $f000 will display A as the first character on the screen.

* A "ROM chip" with screen control routines is mapped into the emulator at the end of the memory space (at the time of writing, the current version of the ROM exists in pages $fe and $ff). Details of the available ROM routines can be viewed using the "Notes" button in the emulator or on the [[6502_Emulator#ROM_Routines|emulator page]] on this wiki.

* Strings in assembler are stored as sequences of bytes. As is usually the case in assembler, memory management is left to the programmer. You can terminate strings with null bytes (C-style), which are easy to detect one some CPUs (e.g., <code>lda</code> followed by <code>bne / beq</code> on a 6502), or you can use character counts to track string lengths.

==== Building Code ====

==== Auto-vectorization ====

* [https://gcc.gnu.org/projects/tree-ssa/vectorization.html Auto-Vectorization in GCC] - Main project page for the GCC auto-vectorizer.

* [http://locklessinc.com/articles/vectorize/ Auto-vectorization with gcc 4.7] - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this ~~article~~ arhttps://forms.office.com/Pages/DesignPage.aspx?lang=en-US&origin=OfficeDotCom&route=Start#Analysis=true&FormId=Svc06-dYi0qeWUM-TEEnVxA8PnpzsPpIhWTKX2RkUF1URUdYTzk2VjZXS1hJSDBOUUVLQzlFRFpCTy4uticle was written. '''This article is strongly recommended.'''

* [https://software.intel.com/sites/default/files/8c/a9/CompilerAutovectorizationGuide.pdf Intel (Auto)Vectorization Tutorial] - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm

==== Inline Assembly Language ====

Chris Tyler

Bureaucrats, Administrators

1,885

edits

Changes

Fall 2021 SPO600 Weekly Schedule

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools