Changes

Jump to: navigation, search

Computer Architecture

1,413 bytes added, 14:54, 14 December 2013
no edit summary
There are many variables in CPU design, including:
* '''Register width''' - Registers typically vary in width, with 8, 16, 32, and 64 bit widths common (though other values are sometimes seen). Some CPUs have multiple registers of different sizes, or can access smaller subsets of larger registers (e.g., accessing the first 8 bits of a 64-bit register when needed), or can access a register as two smaller registers or one larger register (e.g., the 16-bit register on the 6809E processor can also be accessed as two 8-bit registers).
* The '''number of registers''' varies from three or four to many dozen. Some processors are equipped with multiple sets of registers, and can rapidly switch between the register sets on demand (e.g., Intel "Hyperthreading" technology), which simplifies and speeds up process switches. Since registers are often significantly faster than RAM, a larger register set is generally considered better, except that it will take longer to save a larger register set when switching processes. The full set of registers available on a CPU is known as the ''register file''.
* The ''work'' of a CPU is performed by Execution Units, which perform operations such as loading and storing data from/to memory (load/store unit), performing integer math (integer unit), and performing floating-point math (floating-point unit, or FPU). The length of time taken to perform an operation varies according to the sophistication of the execution unit and the complexity of the operation. For example, a multiplication can be performed in many different ways, ranging from repeated addition (very slow, but requiring very little hardware logic) to table lookup (very fast, but requiring a lot of silicon), with most operations falling somewhere in the middle. A multiplication will almost always take longer to perform than an addition, and may vary according to carry and overflow sub-operations required. The use of multiple units permits faster operations to be completed on some execution units while other (slower) operations are taking place on other execution units.
* As instructions are performed, special results are recorded as '''[[Flags]]''' within the CPU. For example, adding or multiplying two numbers will set a "Carry" flag when the result overflows the word width. Other flags may indicate zero or negative result values. These flags can then be used in later operations -- for example, a branch may be taken if the carry flag is set. The number of flags, their specific meanings, and the circumstances under which they are set (to binary "1") and cleared (to binary "0") vary from architecture to architecture.
The RISC vs. CISC debate was at its peak in the 1980s and early 1990s. Most modern CPU designs combine elements of both philosophies. For example, ARM processors, which have historically been considered RISC designs, include out-of-order execution (a CISC feature), while x86 processors (traditionally a CISC design) now feature larger register sets that originally considered a RISC feature.
 
== Instruction Set Architecture ==
 
The [[Instruction Set Architecture]] specifies the encoding of instructions. This is specific to a particular architecture family and therefore dependent on certain architectural features, such as the register set, but independent of other features, such as the cache type -- because the cache type affects performance but not the instructions which can be executed by the CPU.
 
== Processor-specific Optimizations ==
 
Code which is optimized for a particular architecture will take advantage of the features of that architecture, such as the full register file. However, the performance may still vary significantly between processor models within that architecture -- for example, a loop that is small enough to fit in the cache of one processor model may not fit inside the smaller cache of another model within the same architecture family. Likewise, a particular instruction sequence may be optimal for one processor model with a particular combination of execution units, but suboptimal for another model with a different set of units. However, the variation from model to model is usually not huge.
 
Most modern compilers, such as gcc, enable you to set the overall target architecture, but also permit you to optimize performance for a specific CPU model within that target architecture.

Navigation menu