ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Technology»
  • Computers & Software»
  • Computer How-Tos & Tutorials

How to type accented and Unicode characters

Updated on February 27, 2010

These characters are the Latin vowels with various diacritical marks added to them. Most of these letters occur fairly often in various European languages.

á à ä â

é è ë ê

í ì ï î

ó ò ö ô

The group includes 20 letters, each row consists of characters sharing the same basic letter, and each column consists of characters sharing the same diacritical mark. We already have the basic letters a, e, i, o, all we really need to do is add the diacritical marks.

On a teletype machine, the backspace control code would cause the carriage of the teletype to back up one space, causing the next character to be printed over the top of the preceding character. The carriage return code originally returned the teletype carriage to the beginning of the line without advancing the platen, allowing you to overtype a whole line.

Unfortunately, this way of representing accented characters disappeared with the advent of the CRT terminal. Early CRTs weren't sophisticated enough to show two characters in the same display cell. Thus, if they encountered two characters separated by a backspace, the second character would just replace the first one on the screen. As a result, the diacritical marks in ASCII lost their meaning as diacritical marks and just turned into symbols, and the backspace fell into disuse. It was still the code transmitted when you hit the backspace key, but you no longer saw it used inside stored or transmitted text to glue characters together

For CRT display, it became necessary to have a separate character code for each combination of letter and diacritical mark. This caused problems, if you have an unusual combination of base letter and accent that's not encoded, you're out of luck.

Now the prevailing method to add a diacritical mark to a letter is to use a dead key. A dead key or key combination does not generate a character when struck, but modifies the character generated by the key struck immediately after.

The US-International English keyboard

The US-International English keyboard has these features:

It uses the following intuitive methods which work with most (or all) Windows applications, while keeping the familiar QWERTY keyboard.

Press one of the five modifier keys ` ' " ~ ^ , then the letter to be modified. ( ' then a = á, " then u = ü, ' then c = ç, etc. )

Press the right alt key + another key. Examples:

right alt + , = ç (or ' + c)

+ / = ¿

+ 1 = ¡

+ c = © (Copyright symbol)

+ 5 = € (Euro currency symbol)

Note that this maintains the "qwerty" layout. However, the modifier keys ` ' " ~ ^ must sometimes be followed by pressing the space bar when they are actually intended. The system can accept words requiring an apostrophe, such as it's, without the space bar.

United Kingdom extended keyboard

The grave accent becomes a dead key which adds a grave accent to a subsequent a,e,i,o,u,w,y,A,E,I,O,U,W,or Y, generating à, è, etc.

a,e,i,o,u,w,y,A,E,I,O,U,W,Y with acute accent (á, é, etc.) are generated either by pressing AltGr and the relevant character key simultaneously.

AltGr and apostrophe, 6,2, # acts as dead key combination to add acute accent, curcumflex, diaeresis, tilde to a subsequent (acting as a dead key combination) followed by the character.

AltGr + [ ' ^ " ~ ] acts as a dead key combination followed by a character:

+ [a,e,i,o,u,w,y,A,E,I,O,U,W,Y] = [áéíóúẃýÁÉÍÓÚẂÝ]

+ [a,e,i,o,u,w,y,A,E,I,O,U,W,Y] = [âêîôûŵŷÂÊÎÔÛŴŶ]

+ [a,e,i,o,u,w,y,A,E,I,O,U,W,Y] = [äëïöüẅÿÄËÏÖÜẄŸ]

+ [a,n,o,A,N,O] = [ãñõÃÑÕ]

AltGr and c or C will generate the ç or Ç (cedilla) characters, respectively.

Unicode and Unicode keyboard

Unicode dispenses with the idea that a single code point always maps to a single "display cell" on the screen. Either a single code point, or perhaps double or multiple code points might map to a single "display cell".

Unicode includes a class of characters known as "combining marks" or "non-spacing marks." The term "non-spacing mark" also comes from teletype machines—some European teletype machines were designed so that the carriage wouldn't advance to the next position when they received an accent mark, allowing the operator to send an accented letter without using a backspace. This practice goes back to the use of "dead keys" on European typewriters for the same purpose. A non-spacing mark doesn't display as a self-contained unit; instead, it combines typographically with another character. A sequence of code points consisting of a regular character and one or more non-spacing marks is called a combining character sequence.

A Unicode Keyboard should have the capability of simulating handwriting by typing the base character first and then the diacritic mark to produce a combination character. For example, character 'Ç' (C-cedilla) has 'C' as the base character and '¸' (cedilla) as the diacritic mark. In order to get such an accented character with Unicode Keyboard, you have to type two keystrokes: the first one for the base character and the second one for the diacritic mark.

But you can't achieve this on US International keyboard that using a dead key. The dead key has limit, you can't type an unusual combination of base letter and accent, such as ş. On a UK extended keyboard, you can use AltGr + ^ to produce a â, but if you want to add a circumflex on a consonant, ĝŝĥĉ,etc, you couldn't do it. A Unicode keyboard should be designed for WHAT YOU TYPE IS WHAT YOU GET, instead of a lot of awkward key strokes combinations.


    0 of 8192 characters used
    Post Comment

    No comments yet.