BTC640/Text

From CDOT Wiki
Revision as of 17:26, 20 February 2012 by Andrew (talk | contribs) (Font availability)
Jump to: navigation, search

Lecture

Textbook chapter: 2

Some forms of information are best delivered using text. Describing a view in text may be problematic to replace an online image gallery but trying to draw the latest news or a technical specification is also very challenging.

The reality is that to present material well means using a combination of technologies and text should not be discouted as old or obsolete - it is still very useful, important, and quite complex (lots to learn).

Definitions: glyphs, fonts, styles, rasterizing, serif, sans serif, anti-aliasing, true type fonts (TTF), PostScript.

Font attributes: point size, weight, slope, width.

Font availability

Almost every single computer has a different set of fonts available. A different set of fonts comes with every different version of the operating system, can be installed by applications, and installed by the user.

For example text in a rich text file may look very different on your computer at home than the same file on a school computer. It will almost certainly look different on a different OS (Linux, OSX).

If the desired font is not available, applications have rules for choosing the closest font that is available.

PDF is one of the very few text formats where the file includes the fonts so that it looks identical on all platforms.

Text can be rasterised before distribution (turned into a bitmap), which will ensure that text looks the same on all platforms, but a bitmap of a text is usually not a good option except for titles and perhaps buttons.

Consider also that on different platforms it may be desirable to have the text look different, for example to match a default system font.

Setting font properties

On the web text can be associated with a font (and decorations) in several different ways:

  • Hard-coded tags such as <b>, <i>, <font>. This is not a good idea unless the website look will never change. Hand-editing formatting everywhere it is used is very time consuming (i.e. expensive).
  • HTML tags that describe the type of the text rather than how it looks, such as <h1>, <p>, <blockquote>. This is the best choice for simple websites, the look will be decided by the browser and can be modified using CSS.
  • CSS classes and attributes. This is the most common ways to associate text with how it looks. If a change is needed - only the CSS file needs to be modified, no searching for <font> and <b> tags in the body. CSS also allows modification of colour, spacing, and other interesting properties. See p.30 in the textbook for some examples.

Font styles are used not only on the web but in regular applications such as office suites. For example you can set what the Heading 1 type looks like in your document, and later change how all the top-level headers look by modifying that style.

If you're writing a book (to be printed), you will most likely use LaTeX for the markup. Here too the formatting details are usually definted separately from the text they apply to.

Languages

English is the primary language of computers, but it's far from beeing the only one that matters. Operting systems (Windows, OSX, Linux) and most software ships in many different languages and many non-english-speaking users never have the need to learn english to use a computer or the internet effectively.

To make use of different languages we separate the representation of the information (e.g. "Create" and "Créer") from the information itself (this is text that will be shown to the user and they will be able to click on it to create a page). Then a list of strings can be extracted from the software or website and translated into different languages without programming knowledge. Because the mapping of meaning to representation is static (only the language of the representation changes) the table can be easily dropped back into the software which is now available in an extra language.

Technologies that can be used to this end:

The whole process is called localization or internationalizaion (l10n, i18n). The exact definition of the terms is slightly more complicated than that but we don't need to go into it.

The representation of a character in memory is different depending on the software that's beeing used. The common options are ascii and Unicode (UTF8, UTF16/UCS2, UTF32/UCS4).

Fonts for non-latin character sets may not be available on a target system but these days the support is pretty good across the board.

Other

Menus, buttons, fields, dynamic layout in applications

Degree Students

Not entirely related to text is the concept of 'chunking'. People are capable of remembering 7 plus or minus two pieces of data in their short-term memory. In other words: when someone sees more than 7 pieces of information in one place, one cannot keep it all in memory.

If you do a google search for chunking usability or chunking design you will find lots of interesting information about this principle.

It was discovered nearly a century ago and originally was only considered in relation to physical tasks. It made sense when explaining learning - a set of procedures would become a single chunk after enough repetitoion, thus a more experienced person could do more advanced things because they had chunked collections of simpler things into single units.

The same applies to learning information, programming for example. Students spend time writing a function because they have to think of individual lines of code such as printf(), while(), return() but experienced programmers see the whole function as a unit and don't give a second thought to the syntax or basic functions used to implement it.

A couple of websites for reading: http://www.chambers.com.au/glossary/chunk.htm and http://www.interaction-design.org/encyclopedia/chunking.html - note how the concept applies to design of presentations.

Lab

This is a marked lab. Please submit it using Moodle (Lab1).

Part 1

You will need to do the following on two platforms (windows, linux, or mac). The windows with IE and matrix with Firefox on the lab machines will work:

  1. Create an HTML page with:
    1. Simple headers in <h1>, <h2>, and <h3> tags.
    2. Add some text in tags with CSS attributes that set the font, each of these separately:
    3. CSS font-family
    4. CSS font-style
    5. CSS font-size
  2. Make sure the whole thing fits into one page (no scrolling), and make a screenshot of it.
  3. Open the exact same page on the other platform (preferably in a different browser), and make a screenshot of that.
  4. List the differences you see, and explain why the two don't look exactly the same.
  5. The two screenshots, your css, html file, and the explanation must be submitted to Moodle/Lab1.

Part 2

You'll need to use matrix (or your own Linux box) for this part. Here we're going to create a small program that prints hello world, and we'll translate that string into french. The point of the exercise is to see the entire process - from writing the code to running it.

Our program is here. Note that since we don't have permissions to write to the system directories we're going to put the translations in a local directory (locale).

#include <libintl.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  setlocale( LC_ALL, "" );
  //bindtextdomain( "hello", "/usr/share/locale" );
  bindtextdomain( "hello", "./locale" );
  textdomain( "hello" );

  printf( gettext( "Hello, world!\n" ) );

  return 0;
}

Compile this using cc hello.c -o hello (note that no special compile flags are needed).

Now extract the translatable strings from your program into a template (hello.pot) file:

xgettext -d hello -s -o hello.pot hello.c

Edit hello.pot (it's a text file) and replace the values for all the default-looking strings with your own. Delete the "fuzzy" line. Change the charset to UTF-8.

This template file is what you would send off to a french-speaking translator, who will create fr.po out of it. But since we don't have a french-speaking translator - we're just going to do it ourselves.

cp hello.pot fr.po

Edit the file fr.po and set the msgstr for the hello world string to something different. It doesn't need to be french, just type something different in there. You can try a string from another language with more exotic characters if you like.

Create the local locale directory I mentioned above. Normally you wouldn't do this, it's only for this exercise:

mkdir -p locale/fr_CA/LC_MESSAGES

And compile your new french translation file:

msgfmt -c -v -o locale/fr_CA/LC_MESSAGES/hello.mo fr.po

If at this point you get any warnings - fix them, and rerun the msgfmt command.\

Run
LC_ALL=fr_CA.UTF-8 ./hello
and if you did everything right - you should see the translated string instead of the "Hello, world!" which you get from running ./hello without setting any variables.

Basically this means that your software will look different based on an environment variable (LC_ALL) so by changing one environment variable you can switch the language of all the software that's been translated into that language. Sadly our lab opensuse installations don't have language packs installed for anything, so you can't see it in action.

Submit the following for this part:

  • You french po file with the translation filled in.
  • Run the following and make a screenshot of the terminal window:
msgfmt -c -v -o locale/fr_CA/LC_MESSAGES/hello.mo fr.po
./hello
LC_ALL=fr_CA.UTF-8 ./hello