Open main menu

CDOT Wiki β

ARMv8

Revision as of 19:53, 14 September 2015 by Chris Tyler (talk | contribs) (Single zImage)

There are a few different terms swirling around the 64-bit ARM space. This page distinguishes between some of these terms and concepts.

ARMv8

ARM architecture version 8 -- known as ARMv8 -- was introduced in ~2012 and is just starting to appear in the market as of 2013/2014.

ARMv8 has two execution states which support 3 Instruction Set Architectures:

  • aarch32 - A 32-bit execution state which supports these instruction sets:
    • A32 (often just called "ARM") - the traditional 32-bit instruction set used in ARMv7 (with minor differences).
    • T32 (Thumb) - a mixed 16- and 32-bit fixed-length instruction set for increased code density, previously referred to as Thumb2.
  • aarch64 - A 64-bit execution state which supports these instruction sets:
    • A64 - a 64-bit capable instruction set encoded into 32-bit fixed-length instructions.

There are different profiles for ARMv8 devices:

  • ARMv8A - Application - For user-level application processors, i.e., servers, smartphones, tablets. ARMv8-A devices support the AArch64 instruction state, and may optionally support the AArch32 instruction state.
  • ARMv8R - Real-time - For real-time applications, which require that hardware events (such as interrupts) receive a response within a (short) hard deadline, such as a fuel injection system. ARMv8-R devices do not support only the AArch32 execution state, and do not support the AArch64 execution state.

AArch32 and AArch64 Support on ARMv8 in Linux

Linux systems may support the execution of AArch32 binaries on an AArch64 platform (multiarch support), or they may prohibit it and allow AArch32 binaries only in a virtual machine.

Debian/Ubuntu supports AArch32 binaries on ARMv8 via a multiarch mechanism similar to that used to support x86_32 binaries on x86_64.

Fedora/Red Hat intentionally does not support AArch32 binaries on ARMv8.

The value of supporting AArch32 binaries on ARMv8 is controversial. The argument for supporting them is for maximum backward-compatibility; the argument against supporting them is that there are very few proprietary/closed-source 32-bit binaries available, they may require recompilation anyways (since AArch32 supports a slight subset of the ARMv7 instruction set), and anything that is a 32-bit open source ARM program can readily be rebuilt for AArch64.

Boot Systems for ARMv8 in Linux

Historical Situation

Very few general-purpose 32-bit ARM systems were ever produced - the billions of ARMv6 and ARMv7 devices that exist generally run a dedicated build of an operating system, even if that operating system is open-source. For example, an Android-based cellphone or tablet (which runs Linux) comes with software particularly customized for that device. There is little or no market for general-purpose operating systems that can be installed on a wide range of 32-bit ARM devices, and therefore there was almost no effort made to standardize the boot process.

Although most 32-bit development boards and general devices (such as the Beagle Bone, Wanda Board, Panda Board, CubieBoard, CubieTruck, Radaxa Rock, Utilite, TrimSlice, and so forth) use a version of the U-Boot bootloader, these are almost always customized and operate in a way that is unique to the device. For example, some U-Boot versions boot only from some combination of NAND/NOR SPI-connected flash memory, eMMC, SD card, or disk; some load the kernel using a configuration stored on the boot device, while others store the boot configuration in the device that holds the U-Boot bootloader (which may be different); some load the U-Boot software itself directly from a particular block offset or FAT slot number, while others load it by name, or load it from SPI-connected flash; and so forth.

Dennis Gilmore of Red Hat and some others have attempted to unify the U-Boot situation; however, this has been an uphill battle, as new 32-bit ARM devices have continued to flood onto the market.

In addition to the boot environment, the machine description (describing the devices which make up the system in addition to the CPU) was originally done using a "machine number" passed in from the boot environment. This led to the creation of incompatible patch sets for the kernel, such that the kernel could not be built so that it would work on a variety of devices - it had to be built for a specific machine.

Single zImage

Arnd Bergman (originally working at IBM, now with Linaro), one of the key ARM kernel maintainers, worked with others to move from machine numbers to using Device Tree to describe the attached hardware. This paved the way to move to a "Single zImage" - a kernel which could run on a variety of different devices by using data in a board-specific Device Tree Blob (DTB) to initialize the correct device drivers with the correct parameters for each device. This in turn has made it much easier for various distros (Fedora, Debian/Ubuntu, Mint, Gentoo, etc) to support a range of devices.

ARMv8 Server Standardization

The situation is different in the server space - companies want to be able to buy servers from any vendor and install a standard operating system. Jon Masters of Red Hat and others have led efforts to standardize the boot process and environment for ARMv8 servers, using UEFI for the boot process and ACPI for machine description. The move from Device Tree to ACPI has caused some grumbling from vendors, but it's a relatively straightforward evolutionary step, and much simpler than jumping from the machine number approach directly to ACPI.

This in turn has led to the development of the ARM Server Base System Architecture specification, which details the minimum requirements for a standard ARMv8 server. Any system following this specification should be able to boot a standard ARMv8 operating system from any vendor. Since this is a clean design in which we learned from previous industry mistakes, there is high hope that the boot situation on ARMv8 will be even better standardized than on x86_64.

Since EFI and ACPI were previously very x86-specific and tied to particular Windows releases, adopting these for ARM systems and non-Windows operating systems has led to changes in the management and governance of these standards.

ARMv8 on Non-Server Devices

It remains to be seen what situation will develop on non-server ARMv8A devices, which generally fall into two categories:

  • Cellphones, tablets, and fixed-function devices (SAN, NAS, Routers)
  • Development boards / hackable devices

It is unlikely that fixed-function devices will boot with UEFI/ACPI, unless that becomes easier than using any other approach.

Development boards will probably initially ship with u-Boot based systems, but it is hoped that SBSA will become simple and straightforward enough that it will eventually gain a footing in this area too. Note that there is a good chance that Cortex-A53/-A57 based small development boards will be available for under $100 in 2015.

Implementations of ARMv8

ARM licenses their technology at several different levels:

  • An architectural licensee has the right to develop their own implementation of a particular ARM architecture. Apple (A7+ CPU) and Applied Micro (X-Gene) fall into this category. These chips execute standard ARMv8A software, but because the designs are prepared by the licensees, the performance profiles may be different from those of other manufacturers and those designed by ARM - for example, branch prediction and pipelining may be different, and some instructions will be slower while other instructions are faster than the corresponding ARM-designed devices. Therefore, optimizations may have different effects. To perform appropriate optimizations for a particular implementation, a compiler can use a "cost table" which contains information about the performance of specific instructions, enabling the compiler to pick the optimal combination of instructions for a particular operation.
  • A design licensee has the right to produce devices using one or more of ARM's chip designs. This requires far less expertise on the part of the licensee, and allows what is basically a cut-and-paste of the standard ARM core(s) into the chip design that the licensee is working on. This enables the licensee to focus on the other IP (intellectual property) blocks on the chip, such as GPUs, memory controllers, radios (cellular, wifi, bluetooth, GPS, zigbee, and so forth), accelerators, and various peripherals. Most ARM licensees fall into this category. Current standard ARM chip designs are designated "Cortex" - the Cortex-A5, A7, A9, A12, A15, and A17 are ARMv7A designs, and the Cortex-A53 and A57 are ARMv8A designs.

Confusing Numbering Schemes

The ARM space is littered with really awful and conflicting numbering schemes.

  • Early ARM chips had numbers that were different from the corresponding architecture levels. For example, the ARM11 processor is an ARMv6 chip, which is much lower-performing than other parts with lower numbers, including the ARMv7-level Cortex-A5, -A7, -A8, and -A9 devices.
  • Cortex designations are not in order of release date, performance, features, or power consumption. Cortex-A8 and Cortex-A9 are some of the older designs in the series; Cortex-A15 chips add hardware virtualization support. Cortex-A12, Cortex-A7, and Cortex-A5 designs followed, with varying power/performance profiles. Cortex-A53 and -A57 chips are ARMv8.
  • Other companies have introduced chips with confusingly similar designations. The Apple A7 chip is not an ARM design and has nothing to do with the Cortex-A7 (or any other Cortex core); it is roughly in the same performance category as a dual-core Cortex-A53.

big.LITTLE

ARM cores may be combined in compatible groups of higher-performance/high-consumption and lower-performance/lower-consumption devices. These configurations are called big.LITTLE.

Typical pairings are:

  • Cortex-A15 and Cortex-A7
  • Cortex-A17 and Cortex-A7
  • Cortex-A57 and Cortex-A53

The advantage to big.LITTLE lies in the ability to turn off cores that are not needed. Thus, when a device such as a cellphone is performing background tasks (screen off), one little core may be used; when the device is performing basic tasks, a couple of little cores or one big core may be used; and when very demanding tasks are performed, several big cores (or all of the cores) may be turned on.

Balancing power vs. performance can be very difficult - for example, will it require less battery to keep a little core on constantly, or run a big core for a fraction of a second and then sleep all of the cores?

Wikipedia has a page on big.LITTLE that includes a list of known implementations.