BTC640/Text

From CDOT Wiki
Revision as of 20:24, 21 November 2011 by Andrew (talk | contribs)
Jump to: navigation, search

Lecture

Textbook chapter: 2

Some forms of information are best delivered using text. Describing a view in text may be problematic to replace an online image gallery but trying to draw the latest news or a technical specification is also very challenging.

The reality is that to present material well means using a combination of technologies and text should not be discouted as old or obsolete - it is still very useful, important, and quite complex (lots to learn).

Definitions: glyphs, fonts, styles, rasterizing, serif, sans serif, anti-aliasing, true type fonts (TTF), PostScript.

Font attributes: point size, weight, slope, width.

Font availability

Almost every single computer has a different set of fonts available. A different set of fonts comes with every different version of the operating system, can be installed by applications, and installed by the user.

For example text in a rich text file may look very different on your computer at home than the same file on a school computer. It will almost certainly look different on a different OS (Linux, OSX).

If the desired font is not available, applications have rules for choosing the closest font that is available.

PDF is one of the very few text formats where the file includes the fonts so that it looks identical on all platforms.

Text can be rasterised before distribution (turned into a bitmap), which will ensure that text looks the same on all platforms, but a bitmap of a text is usually not a good option excapt for titles and perhaps buttons.

Consider also that on different platforms it may be desirable to have the text look different, for example to match a default system font.

Setting font properties

On the web text can be associated with a font (and decorations) in several different ways:

  • Hard-coded tags such as <b>, <i>, <font>. This is not a good idea unless the website look will never change. Hand-editing formatting everywhere it is used is very time consuming (i.e. expensive).
  • HTML tags that describe the type of the text rather than how it looks, such as <h1>, <p>, <blockquote>. This is the best choice for simple websites, the look will be decided by the browser and can be modified using CSS.
  • CSS classes and attributes. This is the most common ways to associate text with how it looks. If a change is needed - only the CSS file needs to be modified, no searching for <font> and <b> tags in the body. CSS also allows modification of colour, spacing, and other interesting properties. See p.30 in the textbook for some examples.

Font styles are used not only on the web but in regular applications such as office suites. For example you can set what the Heading 1 type looks like in your document, and later change how all the top-level headers look by modifying that style.

If you're writing a book (to be printed), you will most likely use LaTeX for the markup. Here too the formatting details are usually definted separately from the text they apply to.

Languages

English is the primary language of computers, but it's far from beeing the only one that matters. Operting systems (Windows, OSX, Linux) and most software ships in many different languages and many non-english-speaking users never have the need to learn english to use a computer or the internet effectively.

To make use of different languages we separate the representation of the information (e.g. "Create" and "Créer") from the information itself (this is text that will be shown to the user and they will be able to click on it to create a page). Then a list of strings can be extracted from the software or website and translated into different languages without programming knowledge. Because the mapping of meaning to representation is static (only the language of the representation changes) the table can be easily dropped back into the software which is now available in an extra language.

Technologies that can be used to this end:

The whole process is called localization or internationalizaion (l10n, i18n). The exact definition of the terms is slightly more complicated than that but we don't need to go into it.

The representation of a character in memory is different depending on the software that's beeing used. The common options are ascii and Unicode (UTF8, UTF16/UCS2, UTF32/UCS4).

Fonts for non-latin character sets may not be available on a target system but these days the support is pretty good across the board.

Other

Menus, buttons, fields, dynamic layout in applications

Degree Students

Not entirely related to text is the concept of 'chunking'. People are capable of remembering

Lab