1,885
edits
Changes
ARMv8
,→big.LITTLE
ARM architecture version 8 -- known as ARMv8 -- was introduced in ~2012 and is just starting to appear in the market as of 2013/2014.
ARMv8 has two execution states which support 3 three [[Instruction Set Architecture|Instruction Set Architectures]]:* aarch32 - A 32-bit execution state which supports these instruction sets:** A32 (often just called "ARM") - the traditional 32-bit instruction set used in ARMv7 (with minor differences).** T32 (Thumb) - a mixed 16- and 32-bit fixed-length instruction set for increased code density, previously referred to as Thumb2.* aarch64 - A 64-bit execution state which supports these instruction sets:** A64 - a 64-bit capable instruction set encoded into 32-bit fixed-length instructions.
=== AArch32 Execution State === AArch32 is a 32-bit execution state which supports these instruction sets:* A32 (often just called "ARM") - the traditional 32-bit instruction set used in ARMv7 (with minor differences).* T32 (Thumb) - a mixed 16- and 32-bit fixed-length instruction set for increased code density, previously referred to as Thumb2. === AArch64 Execution State ===AArch64 is a 64-bit execution state which supports these instruction sets:* A64 - a 64-bit capable instruction set encoded into 32-bit fixed-length instructions. == ARMv8 Profiles == There are different ''profiles'' for ARMv8 devices, including:* ARMv8A Armv8-A - ''Application'' - For user-level application processors, i.e., servers, smartphones, tablets. ARMv8-A devices support the AArch64 instruction state, and may optionally support the AArch32 instruction state.* ARMv8R Armv8-R - ''Real-time'' - For real-time applications, which require that hardware events (such as interrupts) receive a response within a (short) hard deadline, such as a fuel injection system. ARMv8-R devices support only the AArch32 execution state, and do not support the AArch64 execution state.* Armv8-M - '' Microcontroller'' - For small embedded / microcontroller applications.
== AArch32 and AArch64 Support on ARMv8 in Linux ==
Although most 32-bit development boards and general devices (such as the Beagle Bone, Wanda Board, Panda Board, CubieBoard, CubieTruck, Radaxa Rock, Utilite, TrimSlice, and so forth) use a version of the U-Boot bootloader, these are almost always customized and operate in a way that is unique to the device. For example, some U-Boot versions boot only from some combination of NAND/NOR SPI-connected flash memory, eMMC, SD card, or disk; some load the kernel using a configuration stored on the boot device, while others store the boot configuration in the device that holds the U-Boot bootloader (which may be different); some load the U-Boot software itself directly from a particular block offset or FAT slot number, while others load it by name, or load it from SPI-connected flash; and so forth.
Dennis Gilmore of Red Hat and some others have attempted to unify the U-Boot situation; however, this has been an uphill battle, as new 32-bit ARM devices have continued to flood onto the market.
In addition to the boot environment, the machine description (describing the devices which make up the system in addition to the CPU) was originally done using a "machine number" passed in from the boot environment. This led to the creation of incompatible patch sets for the kernel, such that the kernel could not be built so that it would work on a variety of devices - it had to be built for a specific machine.
The situation is different in the server space - companies want to be able to buy servers from any vendor and install a standard operating system. Jon Masters of Red Hat and others have led efforts to standardize the boot process and environment for ARMv8 servers, using UEFI for the boot process and ACPI for machine description. The move from Device Tree to ACPI has caused some grumbling from vendors, but it's a relatively straightforward evolutionary step, and much simpler than jumping from the machine number approach directly to ACPI.
This in turn has led to the development of the ARM ''[https://lwn.net/Articles/584123/ Server Base System Architecture]'' (SBSA) specification, which details the minimum hardware requirements for a standard ARMv8 server, and the ''[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0044b/index.html Server Base Boot Requirements]'' (SBBR) specification, which details how the boot firmware should work. Any system following this specification these specifications should be able to boot a standard ARMv8 operating system from any vendor. Since this is a clean design in which we learned from previous industry mistakes, there is high hope that the boot situation on ARMv8 will be even better standardized than on x86_64.
Since EFI and ACPI were previously very x86-specific and tied to particular Windows releases, adopting these for ARM systems and non-Windows operating systems has led to changes in the management and governance of these standards.
* Development boards / hackable devices
Development boards will probably initially ship with u-Boot based systemswith or without UEFI, but it is hoped that SBSA will become simple and straightforward enough that it will eventually gain a footing in this area too/or EBBR lead to unification of these approaches. The [http://96boards.org 96boards] project is having a significant impact in bringing under-$100 ARMv8 Consumer Edition (mobile processor-based) and under-$500 Enterprise Edition (enterprise server processor-based) development boards to market; while the Consumer Edition (CE) boards ship with u-boot and/or uefi and/or the Android bootloader, the Enterprise Edition (EE) boards are expected to conform to SBSA.
== Implementations of ARMv8 ==
ARM licenses their technology at several different levels:
* An ''architectural'' licensee has the right to develop their own implementation of a particular ARM architecture. Apple (A7+ CPU) and Applied Micro (X-Gene) fall into this category. These chips execute standard ARMv8A software, but because the designs are prepared by the licensees, the performance profiles may be different from those of other manufacturers and those designed by ARM - for example, branch prediction and pipelining may be different, and some instructions will be slower while other instructions are faster than the corresponding ARM-designed devices. Therefore, optimizations may have different effects. To perform appropriate optimizations for a particular implementation, a compiler can use a "cost table" which contains information about the performance of specific instructions, enabling the compiler to pick the optimal combination of instructions for a particular operation.
* A ''design'' licensee has the right to produce devices using one or more of ARM's chip designs. This requires far less expertise on the part of the licensee, and allows what is basically a cut-and-paste of the standard ARM core(s) into the chip design that the licensee is working on. This enables the licensee to focus on the other IP (intellectual property) blocks on the chip, such as GPUs, memory controllers, radios (cellular, wifi, bluetooth, GPS, zigbee, and so forth), accelerators, and various peripherals. Most ARM licensees fall into this category. Current standard ARM chip designs are designated "Cortex" - the Cortex-A5, A7, A8, A9, A12, A15, and A17 are ARMv7-A 32-bit designs, and the Cortex-A35, A53, A57, A72 , A73, A75, and A72 A76 are ARMv8-A 32/64-bit designs.
=== System-on-a-Chip Implementations ===
Most SoCs offer more features than are used in any one system and more features than can be exposed on the pins which are physically present on the chip. A pin multiplexor system, or PinMux, is used to select which signals are currently exposed on the SoC's pins. For example, a given group of pins could be used for an SPI serial interface, or an I<sup>2</sup>C serial interface, or as general-purpose input/output (GPIO) connections, but are only connected to one of those functions at a time.
In addition, a number of SoCs use high-speed serial interfaces for multiple purposes -- a pool of 40 multi-gigabit-per-second serial interfaces can might be provided, for example, and it is up to the board designer to decide how many of those interfaces to use as the lanes of a PCIe bus, gigabit (or faster) ethernet ports, or as SATA ports.
The operating system kernel has the required mechanisms to set up the PinMux as needed for a given board, and to connect serial controllers to the appropriate drivers. In order to do this, it is critical that the kernel receive not only an accurate description of the SoC, but also an accurate description of how that SoC is wired up in the current system. This information is passed in via a Device Tree or an ACPI description (ACPI is mandated by the SBSA specification for servers).
== Confusing Numbering Schemes ==
The ARM space is littered with really awful very confusing (and conflicting ) numbering schemes.
* Early ARM chips had numbers that were different from the corresponding architecture levels. For example, the ARM11 processor is an ARMv6 chip, which is much lower-performing than other parts with lower numbers, including the ARMv7-level Cortex-A5, -A7, -A8, and -A9 devices.
* Cortex designations are not in order of release date, features, or power consumption, and are only loosely in order by performance. Cortex-A8 (single-core only) and Cortex-A9 (available in single- and multi-core) are some of the older designs in the series; Cortex-A15 chips add hardware virtualization support. The ARMv7 (32 bit) Cortex-A12, Cortex-A7, and Cortex-A5 designs followed, with varying power/performance profiles. Cortex-A35, -A53, -A55, -A57, -A72, -A73, -A75, and -A75 A76 chips are ARMv8ARMv8A.* Other companies have introduced chips with confusingly similar designations. The Apple A7 chip is /A8/A9/A10/... chips are not an ARM design designs and has have nothing to do with the similarly-named Cortex-A7 /-A9/... (or any other Cortex corecores); it the Apple A7, for example, is roughly in the same performance category as a dual-core Cortex-A53. Allwinner and AMD have also used chip designations starting with A (Allwinner A10, A20, and A80, and AMD Opteron A1100, for example); these are unrelated to the Apple A-series chips and to the Cortex A designations. Likewise, the Nvidia K1 is unrelated to the AMD K12. == big.LITTLE ==
The advantage to big.LITTLE /DynamIQ lies in the ability to turn off cores that are not needed. Thus, when a device such as a cellphone is performing background tasks (screen off), one little core may be used; when the device is performing basic tasks, a couple of little cores or one big core may be used; and when very demanding tasks are performed, several big cores (or all of the cores) may be turned on.
Balancing power vs. performance can be very difficult - for example, will it require less energy to keep a little core on constantly to perform a background task, or run a big core for a fraction of a second every few seconds and sleep all of the cores the rest of the time? Issues such as core affinity and cache coherency also play into balancing decisions.
Wikipedia has a [http://en.wikipedia.org/wiki/ARM_big.LITTLE page on big.LITTLE] that includes a list of known implementations.