Changes

Jump to: navigation, search

Winter 2020 SPO600 Weekly Schedule

2,923 bytes added, 13:56, 10 January 2020
Binary Representation of Data
* Compression techniques
** Huffman encoding / Adaptive arithmetic encoding
*** Instead of fixed-length numbers, variable-length numbers are used, with the most common values encoded in the smallest number of bits. This is an effective strategy if the distribution of values in the data set is uneven.
** Repeated sequence encoding (1D, 2D, 3D)
*** Run length encoding is an encoding scheme that records the number of repeated values. For example, fax messages are encoded as a series of numbers representing the number of white pixels, then the number of black pixels, the white pixels, then black pixels, alternating to the end of each line. These numbers are then represented with adaptive artithmetic encoding.
*** Text data can be compressed by building a dictionary of common sequences, which may represent words or complete phrases, where each entry in the dictionary is numbered. The compressed data contains the dictionary plus a sequence of numbers which represent the occurrence of the sequences in the original text. On standard text, this typically enables 10:1 compression.
** Decomposition
*** Compound audio wavforms can be decomposed into individual signals, which can then be modelled as repeated sequences. For example, a waveform consisting of two notes being played at different frequencies can be decomposed into those separate notes; since each note consists of a number of repetitions of a particular wave pattern, they can individually be represented in a more compact format by describing the frequence, waveform shape, and amplitude characteristics.
** Pallettization
*** Images often contain repeated colours, and rarely use all of the available colours in the original encoding scheme. For example, a 1920x1080 image contains about 2 million pixels, so if every pixel was a different colour, there would be a maximum of 2 million colours. But it's likely that many of the pixels in the image are the same colour, so there might only be (perhaps) 4000 colours in the image. If each pixel is encoded as a 24-bit value, there are potentially 16 million colours available, and there is no possibility that they are all used. Instead, a palette can be provided which specifies each of the 4000 colours used in the picture, and then each pixel can be encoded as a 12-bit number which selects one of the colours from the palette. The total storage requirement for the original 24-bit scheme is 1920*1080*3 bytes per pixel = 5.9 MB. Using a 12-bit pallette, the storage requirement is 3 * 4096 bytes for the palette plus 1920*1080*1.5 bytes for the image, for a total of 3 MB -- a reduction of almost 50%
** Psychoacoustic and psychovisual compression
** Much of the data in sound and images cannot be perceived by humans. Psychoacoustic and psychovisual compression remove artifacts which are least likely to be perceived. As a simple example, if two pixels on opposite sides of a large image are almost but not exactly the same, most people won't be able to tell the difference, so these can be encoded as the same colour if that saves space (for example, by reducing the size of the colour palette).
==== Computer Architecture Overview ====

Navigation menu