Path: news.ifm.liu.se!liuida!sunic!uunet!spool.mu.edu!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv From: mike@vlsivie.tuwien.ac.at Newsgroups: comp.unix.questions,comp.unix.admin,comp.std.internat,soc.culture.german,soc.culture.french,soc.culture.quebec,soc.culture.nordic,soc.culture.spain,soc.culture.portuguese,soc.culture.latin-american,soc.culture.brazil,soc.culture.argentina,soc.culture.mexico,comp.answers,soc.answers,news.answers Subject: ISO 8859-1 National Character Set FAQ Followup-To: comp.unix.questions,comp.unix.admin,soc.culture.german,soc.culture.french,soc.culture.quebec,soc.culture.nordic,soc.culture.spain,soc.culture.portuguese,soc.culture.latin-american,soc.culture.brazil,soc.culture.argentina,soc.culture.mexico Date: 25 Oct 1994 14:22:34 GMT Organization: TU Wien Lines: 776 Approved: news-answers-request@MIT.EDU Expires: 6 Dec 1994 14:19:26 GMT Message-ID: NNTP-Posting-Host: bloom-picayune.mit.edu Summary: This FAQ discusses the use of the standardized ISO 8859-1 national character set (supports all (W-)European languages). X-Last-Updated: 1994/10/24 Originator: faqserv@bloom-picayune.MIT.EDU Xref: news.ifm.liu.se comp.unix.questions:53215 comp.unix.admin:24440 comp.std.internat:3043 soc.culture.german:46749 soc.culture.french:47307 soc.culture.quebec:2708 soc.culture.nordic:41273 soc.culture.spain:37031 soc.culture.portuguese:12000 soc.culture.brazil:19585 soc.culture.argentina:8018 comp.answers:7533 soc.answers:1846 news.answers:28855 Archive-name: internationalization/iso-8859-1-charset Posting-Frequency: monthly ISO 8859-1 National Character Set FAQ DISCLAIMER: THE AUTHOR MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Note: Most of this was tested on a Sun 10, running SunOS 4.1.* - other systems might differ slightly This FAQ discusses topics related to the use of ISO 8859-1 based 8 bit character sets. It discusses how to use European (Latin American) national character sets on UNIX-based systems and the internet. 1. Which coding should I use for accented characters? Use the internationally standardized ISO-8859-1 character set to type accented characters. This character set contains all characters necessary to type (West) European languages. This encoding is also the preferred encoding on the Internet (where accepted - see below). This character set is also used by AmigaDOS, MS-Windows (Actually, MS-Windows uses UNICODE (ISO 10646) truncated to 8 bit, which gives an equivalent encoding.), VMS (DEC MCS is practically equivalent to ISO 8859-1) and (practically all) UNIX implementations. MS-DOS normally uses a different character set and is not compatible with this character set. (It can, however, be translated to this format with various tools. See section 7.) Footnote: Supposedly, IBM code page 819 is fully ISO 8859-1 compliant. (If you can confirm or deny this, please let me know!) ISO 8859-1 supports the following languages: Afrikaans, Catalan (?), Danish, Dutch, English, Faeroese, Finnish, French, German, Galician, Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish and Swedish. (It has been called to my attention that Albanian can be written with ISO 8859-1 also. However, from a standards point of view, ISO 8859-2 is the appropriate character set for Balkan countries.) ISO 8859-1 is just one part of the ISO-8859 standard, which specifies several character sets, e.g.: 8859-1 Europe, Latin America 8859-2 Eastern Europe 8859-3 SE Europe/miscellaneous (Esperanto, Maltese, etc.) 8859-4 Scandinavia/Baltic (mostly covered by 8859-1 also) 8859-5 Cyrillic 8859-6 Arabic 8859-7 Greek 8859-8 Hebrew 8859-9 Latin5, same as 8859-1 except for Turkish instead of Icelandic 8859-10 Latin6, for Eskimo/Scandinavian languages 2. Getting your terminal to handle ISO characters. Terminal drivers normally do not pass 8 bit characters. To enable proper handling of ISO characters, add the following lines to your .cshrc: ---------------------------------- tty -s if ($status == 0) stty cs8 -istrip -parenb ---------------------------------- If you don't use csh, add equivalent code to your shell's start up file. Note that tty checks stdin, but stty changes stdout. This is OK in normal code, but if the .cshrc is executed in a pipe, you may get spurious warnings :-( Note that it is necessary to check whether your standard I/O streams are connected to a terminal. Only then should you reconfigure the terminal driver. Footnote: If you use the Bourne Shell or descendants (sh, ksh, bash, zsh), use this code in your statup (.profile) file: ---------------------------------- tty -s if [ $? = 0 ]; then stty cs8 -istrip -parenb fi ---------------------------------- 3. Selecting the right font under X-11 for xterm (and other applications) To actually display accented characters, you need to select a font which does contains bit maps for ISO 8859-1 characters in the correct character positions. The names of these fonts normally have the suffix "iso8859-1". Use the command # xlsfonts to list the fonts available on your system. You can preview a particular font with the # xfd -fn command. Add the appropriate font selection to your ~/.Xdefaults file, e.g.: ---------------------------------------------------------------------------- XTerm*Font: -adobe-courier-medium-r-normal--18-180-75-75-m-110-iso8859-1 Mosaic*XmLabel*fontList: -*-helvetica-bold-r-normal-*-14-*-*-*-*-*-iso8859-1 ---------------------------------------------------------------------------- Footnote: The X11R5 distribution has some fonts which are labeled as ISO fonts, but which do not contain the ISO characters. 4. Getting the locale setting right. For the ctype macros (and by extension, applications you are running on your system) to correctly identify accented characters, you may have to set the ctype locale to an ISO 8859-1 conformant configuration. On SunOS this may be done by placing ------------------------------------ setenv LANG C setenv LC_CTYPE iso_8859_1 ------------------------------------ in your .login script (if you use the csh). An equivalent statement will adjust the ctype locale for non-csh users. The process is the same for other operating systems, e.g. on HP/UX use 'setenv LANG german.iso88591'; on IRIX 5.2 use 'setenv LANG de'; on Ultrix 4.3 use 'setenv LANG GER_DE.8859' and on OSF/1 use 'setenv LANG de_DE.88591'. The examples given here are for German. Other languages work too, depending on your operating system. Check out 'man setlocale' on your system for more information. Footnote on HP systems: As of 10.0, you can use either german.iso88591 or de_DE.iso88591 (a name more in line with other vendors and developing standards for locale names). For a complete listing of locale names, see the text file /usr/lib/nls/config. Or, on HP-UX 10.0, execute locale -a . This command will list all locales currently installed on your system. 5. Printing accented characters. 5.1 PostScript printers If you want to print accented characters on a postscript printer, you may need a PS filter which can handle ISO characters. Our Postscript filter of choice is a2ps, the more recent version of which can handle ISO 8859-1 characters with the -8 option. a2ps V4.3 is available via anonymous ftp from imag.imag.fr under the file name /archive/postscript/a2ps.V4.3.tar.Z. If you use the pps postscript filter, use the 'pps -ISO' option for pps to handle ISO 8859-1 characters properly. 5.2 Other (non-PS) printers: If you want to print to non-PS printers, your success rate depends on the encoding the printer uses. Several alternatives are possible: * Your printer accepts ISO 8859-1: You're lucky. No conversion is needed, just send your files to the printer. * You printer supports a PC-compatible font: You can use the recode tool to translate from ISO 8859-1 to this encoding. (If you are using a SunOS based computer, you can also use the unix2dos utility which is part of the standard distribution.) Just add the appropriate invocation as a built-in filter to your printer driver. * Your printer uses a national ISO 646 variant (7 bit ASCII with some special characters replaced by national characters): You will have to use a translation tool; this tool would then be installed in the printer driver and translate character conventions before sending a file to the printer. The recode program supports many national ISO 646 norms. (If you add do this, please submit it to the maintainers of recode, so that it can benefit everybody.) Unfortunately, you will not be able to display all acharcters with the built-in characters set. Most printers have user-defineable bit-map characters, which you can use to print all ISO characters. You just have to generate a pix-map for any particular character and send this bitmap to the printer. The syntax for these characters varies, but a few conventions have gained universal acceptance (e.g., many printers can process Epson-compatible escape sequences). * Your printer supports a strange format: If your printer supports some other strange format (e.g. HP Roman8, DEC MCS, Atari, NeXTStep EBCDIC or what have you), you have to add a filter which will translate ISO *859-1 to this encoding before sending your data to the printer. 'recode' supports many of these character sets already. If you have to write your own conversion tool, consider this as a good starting base. (If you add support for any new character sets, please submit your code changes to the maintainers of recode). If your printer supports DEC MCS, this is nearly equivalent to ISO 8859-1 (actually, it is a former ISO 8859-1 draft standard. The only characters which are missing are the Icleandic characters (eth and thorn)at locations 0xD0, 0xF0, 0xDE and 0xFE) - the difference is only a few characters. You could probably get by with just sending ISO 8859-1 to the printer. * Your printer supports ASCII only: You have several options: + If your printer supports user-defined character, you can print all ISO characters not supported by ASCII by sending the appropriate bitmaps. + Add a filter to the printer driver which will strip the accent characters and just print the unaccented characters. + Add a filter which will generate escape sequences (such as " a for Umlaut-a (ä), etc.) to be printed. Recode supports this encoding under the name ascii-bs. Footnote: For more information on character translation and the 'recode' tool, see section 7. 6. TeX and ISO 8859-1 If you want to write TeX without having to type {\"a}-style escape sequences, you can either get a TeX versions configured to read 8-bit ISO characters, or you can translate between ISO and TeX codings. The latter is arduous if done by hand, but can be automated if you use emacs. If you use Emacs 19.23 or higher, simply add the following line to your .emacs startup file. This mode will perform the necessary translations for you automatically: ------------------ (require 'iso-cvt) ------------------ If you are using pre-19.23 versions of emacs, get the "gm-lingo.el" lisp file via anonymous ftp from ftp.vlsivie.tuwien.ac.at in /pub/8bit. Load gm-lingo from your .emacs startup file and this mode will perform the necessary translations for you automatically. If you want to configure TeX to read 8 bit characters, check out the configuration files available via anonymous ftp from ftp.vlsivie.tuwien.ac.at in /pub/8bit. Newer LaTeX version provide more comprehensive support for 8 bit characters with the isolatin package: In LaTeX 2.09, use \documentstyle[isolatin]{article} to include support for ISO latin1 characters. In LaTeX 2e, the commands \documentclass{article} \usepackage{isolatin} will do the job. isolatin.sty is available from all CTAN servers. 7. Translating between different international character sets. While ISO 8859-1 is an international standard, not everybody uses this encoding. Many computers use their own, vendor-specific character sets (most notably Microsoft for MS-DOS). If you want to edit or view files written in different encoding, you will have to translate them to an ISO 8859-1 based representation. There are several PD/free character set translators available on the internet, the most notable being 'recode'. recode is available via anonymous ftp from prep.ai.mit.edu and resides in the directory /u2/emacs. recode is covered by FSF copyright and is freely redistributable. Under SunOS, the dos2unix and unix2dos programs (distributed with SunOS) will translate between MS-DOS and ISO 8859-1 formats. It is somewhat more difficult to convert German, 'Duden'-conformant Ersatzdarstellung (ä = ae, ß = sz (or not so conformant 'ss') etc.) into the ISO 8859-1 character set. The German dictionary available via anonymous ftp from ftp.vlsivie.tuwien.av.at in /pub/8bit/dicts/deutsch.tar.gz also contains a UNIX shell script which can handle all conversions except ones involving ß (German scharfes-s). If your text contains 'sz' for ß, this is easy to handle (globally change sz to ß), for 'ss' this change is more complicated. A more sophisticated program to translate Duden Ersatzdarstellung to ISO 8859-1 is Gustaf Neumann's diac programm (version 1.3 or later) which can translate all ASCII sequences to their respective ISO 8859-1 character set representation. 'diac' is available via anonymous ftp from ftp.vlsivie.tuwien.ac.at in /pub/8bit/diac. Translating ISO 8859-1 to ASCII can be performed with a little sed script according to your needs. But be aware that * No one-to-one mapping between Latin 1 and ASCII strings is possible. * Text layout may be destroyed by multi-character substitutions, especially in tables. * Different replacements may be in use for different languages, so no single standard replacement table will make everyone happy. * Truncation or line wrapping might be necessary to fit textual data into fields of fixed width. * Reversing this translation may be difficult or impossible. * You may be introducing ambiguities into your data. 8. ISO 8859-1 and emacs Emacs 19 (as opposed to Emacs 18) can automatically handle 8 bit characters. (If you have a choice, upgrade to Emacs version 19.23, which has the most complete ISO support.) Emacs 19 has extensive support for ISO 8859-1. If your display supports ISO 8859-1 encoded characters, add the following line to your .emacs startup file: ----------------------------- (standard-display-european t) ----------------------------- If want to display ISO-8859-1 encoded files by using TeX-like escape sequences (e.g. if your terminal supports only ASCII characters), you should add the following line to your .emacs file (DON'T DO THIS IF YOUR TERMINAL SUPPORTS ISO OR SOME OTHER ENCODING OF NATIONAL CHARACTERS): -------------------- (require 'iso-ascii) -------------------- If your terminal supports a non-ISO 8859-1 encoding of national characters (e.g. 7 bit national variant ISO 646 character sets, aka. 'national ASCII' variants), you should configure your own display table. The standard emacs distribution contains a configuration (iso-swed.el) for terminals which have ASCII in the G0 set and a Swedish/Finnish version of ISO 646 in the G1 set. If you want to create your own display table configuration, take a look at this sample configuration and at disp-table.el for available support functions. Emacs can also accept 8 bit ISO 8859-1 characters as input. These character codes might either come from a national keyboard (and driver) which generates ISO-compliant codes, or may have been entered by use of a COMPOSE-character mechanism. If you use such an input format, execute the following expression in your .emacs startup file to enable Emacs to understand them: ------------------------------------------------- (set-input-mode (car (current-input-mode)) (nth 1 (current-input-mode)) 0) ------------------------------------------------- In order to configure emacs to handle commands operating on words properly (such as 'Beginning of word, etc.), you should also add the following line to your .emacs startup file: ------------------------------- (require 'iso-syntax) ------------------------------- For further information on using ISO 8859-1 with emacs, also see the Emacs manual section on "European Display" (available as hypertext document by typing C-h i in emacs or as a printed version). 9. Typing ISO with US-style keyboards. Many computer users use US-ASCII keyboards, which do not have keys for national characters. You can use escape sequences to enter these characters. For ASCII terminals (or PCs), check the documentation of you terminal for particulars. 9.1 US-keyboards under X11 Under X Windows, the COMPOSE multi-language support key can be used to enter accented characters. Thus, when running X11 on a SunOS-based computer (or any other X11R5 server supporting COMPOSE characters), you can type three character sequences such as COMPOSE " a -> ä COMPOSE s s -> ß COMPOSE ` e -> è to type accented characters. Note that this COMPOSE capability has been removed as of X11R6, because it does not adequately support all the languages in the world. Instead, compose processing is supposed to be performed in the client using an 'input method'. (In the short term, this is a step backward, as few clients support this type of processing at the moment.) Input methods are controlled by the locale environment variables (LANG and LC_xxx). The values for these variables are (or at least, should be made equivalent by any sane vandor) equivalent to those expected by the ANSI/POSIX locale library. For a list of possible settings see section 4. 9.2 US-keyboards and emacs There are several modes to enter Umlaut characters under emacs when using a US-style keyboard. One such mode is iso-transl, which is distributed with the standard emacs distribution. This mode uses the Alt-key for entering diacritical marks (accents et al.). An extended iso-transl mode (iso-transl+) which allows the definition of language specific short cuts is available via anonymous ftp from ftp.vlsivie.tuwien.ac.at in /pub/8bit/iso-transl+.shar. This file also includes sample configurations for the German and Spanish languages. An alternative to using Alt-sequences for entering diacritical marks is the use of 'electric accents', such as used on old type writers or under many MS Windows programs. With this method, typing an accent character will place this accent on the next character entered. One mode which supports this entry method is the iso-acc minor mode which comes with the standard emacs distribution. Just add ------------------ (require 'iso-acc) ------------------ to your emacs startup script, and you can turn the '`~/^" keys into electric accents by typing 'M-x iso-accents-mode' in a specific buffer. To type the ç (c with cedille) and ß (German scharfes s) characters, type ~c and "s, respectively. 10. File names with ISO characters If your OS is 8 bit clean, you can use ISO characters in file names. (This is possible under SunOS.) 11. Command names with ISO 8859-1 If your OS supports file names with ISO characters, and your shell is 8 bit clean, you can use command names containing ISO characters. If your shell does not handle ISO characters correctly, use one of the many PD shells which do (e.g. tcsh, an extended csh). These are available from a multitude of ftp sites around the world. For tcsh, versions 6.04 or higher are 8 bit clean (if compiled correctly), for bash the relevant version is 1.14.1 or higher. 12. Spell checking Ispell 3.1 has by far the best understanding of non-English languages and can be configured to handle 8-bit characters (Thus, it can handle ISO-8859-1 encoded files). Ispell 3.1 now comes with hash tables for several languages (English, German, French,...). It is available via anonymous ftp from ftp.cs.ucla.edu in /pub. Ispell also contains a list of international dictionaries and about their availability in the file ispell/languages/Where. The following sites also have dictionaries for ispell available via anonymous ftp: language site file name french ireq-robot.hydro.qc.ca /pub/ispell french ftp.inria.fr /INRIA/Projects/algo/INDEX/iepelle french ftp.inria.fr /gnu/ispell3.0-french.tar.gz german ftp.vlsivie.tuwien.ac.at /pub/8bit/dicts/deutsch.tar.gz (spanish ftp.vlsivie.tuwien.ac.at /pub/8bit/dicts/spanish.shar.gz) Some spell checkers use strange encodings for accented characters. If you have to use one of these spell checkers, you may have to run recode before invoking the spell checker to generate a file using your spell checker's coding conventions. After running the spell checker, you have to translate the file back to ISO with recode. Of course, this can be automated with a shell script: --------------------- recode $i tmp.file spell_check recode tmp.file $i --------------------- Footnote: Ispell 4.* is not a superset of ispell 3.*. Ispell 4.* was developed independently from a common ancestor, but DOES NOT support any internationalization, but is restricted to the English language. 13. TCP and ISO 8859-1 TCP was specified by US-Americans, for US-Americans. TCP still carries this heritage: while TCP/IP protocol itself *is* 8 bit clean, no effort was made to support the transfer of non-English characters in many application level protocols (mail, news, etc.). Some of these protocols still only specify the transfer of 7-bit data, leaving anything else implementation dependent. Since the TCP/IP protocol itself transfers 8 bit data correctly, writing applications based on TCP/IP does not lead to any loss of encoding information. 13.1 FTP and ISO 8859-1 FTP has support for transferring 8 bit binary data. This mode should be used when transferring ISO coded data between two hosts. This mode is normally enabled by the command "binary". Note, however, that use of the binary mode for text files will disable translation between the line-ending conventions of different operating systems. You might have to provide some filter to convert between the LF-only convention of Unix and the CR-LF convention of VMS and MS Windows when you copy from one of these systems to another. 13.2 Mail and ISO 8859-1 The original sendmail protocol specification (SMTP) in RFC 821 specified the transfer of only 7 bit messages. Many sendmail implementations have been made 8 bit transparent (see RFC 1428), but some SMTP handling agents are still strictly conforming to the (somewhat outdated) RFC 821 and intentionally cut off the 8th bit. This behavior stymies all efforts to transfer messages containing national characters. Thus, only if all SMTP agents between mail originator and mail recipient are 8 bit clean, will messages be transferred correctly. Otherwise, accented characters are mapped to some ASCII character (e.g. Umlaut a -> 'd'), but the rest of the messages is still transferred correctly. A new, enhanced (and compatible) SMTP standard, ESMTP, has been released as RFC 1425. This standard defines and standardizes 8 bit extensions. This should be the mail protocol of choice for newly shipped versions of sendmail. DEC Ultrix sendmail still implements the somewhat outdated RFC 821 to the letter, and thus cuts off the eighth bit of all mail passing through it. Thus ISO encoded mail will always lose the accent marks when transferred through a DEC host. Much of the European and Latin American network infrastructure supports the transfer of 8 bit mail messages, the success rate is somewhat lower for the US. The MIME standard defines a mail transfer protocol which can handle different character sets and multimedia mail, independent of the network infrastructure. This protocol should eventually solve problems with 7-bit mailers etc. Unfortunately, no mail transfer agents (mail routers) and few end user mail readers support this standard. Source for supporting MIME (the `metamail' package) in various mail readers is available via anonymous ftp from thumper.bellcore.com in /pub/nsb. MIME is specified in RFC 1521 and RFC 1522 which are available from ftp.uu.net. There is also a MIME FAQ which is available via anonymous ftp from ftp.ics.uci.edu in /mh/contrib/multimedia/mime-faq.txt.gz. (This file is in compressed format. You will need the GNU gunzip program to decompress this file.) PS: If your computer is running DEC Ultrix and you want it to handle ISO characters properly, you can get get the source for /usr/lib/sendmail from its home at UCB and many other FTP sites. OR, you can simply call DEC, complain that their standard mail system cannot handle international 8 bit mail, encourage them to implement 8 bit transparent SMTP, or (even better) ESMTP, and ask for the sendmail patch which makes their current sendmail 8 bit transparent. (Reportedly, such a patch is available from DEC for those who ask.) Newer versions of sendmail support ESMTP negotiation and can pass 8 bit data. However, they do not (yet?) support downgrading of 8 bit MIME messages, 13.3 News and ISO 8859-1 Much as mail, the Usenet news protocol specification is 7 bit based, but a significant part of the infrastructure has recently been upgraded to 8 bit service... Thus, accented characters are transferred correctly between much of Europe (and Latin America), but accents sometimes get lost in networks which run old news software (BNews). ISO 8859-1 is _the_ standard for typing accented characters in most newsgroups (may be different for MS-DOS centered newsgroups ;-), and is preferred in most European news group hierarchies, such as at.* or de.* For those who speak French, there is an excellent FAQ on using ISO 8859-1 coded characters on Usenet by François Yergeau. This FAQ is regularly posted in soc.culture.french and other relevant newsgroups. 13.4 WWW (and other information servers) The WWW protocol can transfer 8 bit data without any problems and you can advertise ISO-8859-1 encoded data from your client. The display of data is dependent upon the user client. xmosaic (freely available from the NCSA) which is available for most UNIX platforms uses an ISO-8859-1 compliant font by default and will display data correctly. 13.5 rlogin For rlogin to pass 8 bit data correctly, invoke it with 'rlogin -8' or 'rlogin -L'. 14. Some applications and ISO 8859-1 14.1 bash You need version 1.14.1 or higher and set the locale correctly (see section 4). Bash is available from prep.ai.mit.edu in /pub/gnu. 14.2 elm Elm automatically supports the handling of national character sets, provided the environment is configured correctly. When you compile elm with MIME support, you have two options: * you can compile elm to use 8 bit ISO-8859-1 as transport encoding: If you use this encoding even people without MIME compliant mailers will be able to read your mail messages, if they use the same character set. The eight bit may however be cut off by 7 bit MTAs (mail transfer agents), and mutilated mail might be received by the recipient, regardless of whether she uses MIME or not. (This problem should be eased when 8 bit mailers are upgraded to understand how to translate 8 bit mails to 7 bit encodings when they encounter a 7 bit mailer.) * you can compile elm to use 7 bit US-ASCII as transport encoding: this encoding ensures that you can transfer your mail containing national characters without having to worry about 7 bit MTAs. A MIME compliant mail reader at the other end will translate your message back to your national character set. Recipients without MIME compliant mail readers will however see mutilated messages: national characters will have been replaced by sequences of the type '=FF' (with FF being the ISO code (in hexadecimal) of the national character being encoded). 14.3 less Set the LESSCHARSET environment variable with 'setenv LESSCHARSET latin1'. 14.4 metamail To configure the metamail package for ISO 8859-1 input/output, set the MM_CHARSET environment variable with 'setenv MM_CHARSET ISO-8859-1'. Also, set the MM_AUXCHARSETS variable with 'setenv MM_AUXCHARSETS iso-8859-1'. 14.5 nn Add the line ----------------- set data-bits 8 ----------------- to your ~/.nn/init file for nn to be able to process 8 bit characters. 14.6 nroff The GNU replacement for nroff, groff, has an option to generate ISO 8859-1 coded output, instead of plain ASCII. Thus, you can preview nroff documents with correctly displayed accented characters. Invoke groff with the 'groff -Tlatin1' option to achieve this. Groff is free software. It is available via anonymous ftp from prep.ai.mit.edu in /pub/gnu and many other GNU archives around the world. 14.7 sendmail BSD Sendmail v8 has a flag in the configuration file set to True or False which determines whether v8 passes any 8-bit data it encounters, presumably to match the behavior of other 8-bit-transparent MTAs and to meet the wants of non-ASCII users, or if it strips to 7 bits to conform to SMTP. 14.8 tcsh You need version 6.04 or higher, and your lcale has to be set properly (see section 4). Tcsh also needs to be compiled with the national language support feature, see the config.h file in the tcsh source directory. Tcsh is an extended csh and is available from tesla.ee.cornell.edu in /pub/tcsh. 14.9 vi Support for 8 bit character sets depends on the OS. It works under SunOS 4.1.*, but on OSF/1 vi gets confused about the current cursor position in the presence of 8 bit characters. 15. Terminals 15.1 X11 Terminal Emulators 15.1.1 xterm If you are using X11 and xterm as your terminal emulator, you should place the following line in ~/.Xdefaults (this seems to be required in some releases of X11, not in all): ------------------------- XTerm*EightBitInput: True ------------------------- 15.1.2 xrvt xrvt is another terminal emulator used for X11, used mostly under Linux. Invoke xrvt with the 'xrvt -8' comand line. 15.2 VT2xx, VT3xx The character encoding used in VT2xx terminals is a preliminary version of the ISO-8859-1 standard, so some characters (the more obscure ones) differ slightly. However, these terminals can be used with ISO 8859-1 characters without problems. The newer VT3xx terminals use the official ISO 8859-1 standard. The international versions of the VT[23]xx terminals have a COMPOSE key which can be used to enter accented characters, eg. <'> will give an e accent aigu. 15.3 Various UNIX terminals Some terminal support down-loadable fonts. If characters sent to these terminals can be 8 bit wide, you can down-load your own ISO characters set. To see how this can be achieved, take a look at the /pub/culture/russian/comp/cyril-term on nic.funet.fi. 15.4 MS-DOS PCs MS-DOS PCs normally use a different encoding for accented characters, so there are two options: * you can use a terminal emulator which will translate between the different encodings. If you use the PROCOMM PLUS, TELEMATE and TELIX modem programs, you can down-load the translation tables from via anonymous ftp from oak.oakland.edu as /pub/msdos/modem/xlate.zip * you can reconfigure your MS-DOS PC to use an ISO-8859-1 code page. Check out the anonymous ftp archive ftp.uni-erlangen.de, which contains data on how to do this (and other ISO-related stuff) in /pub/doc/ISO/charsets. The README file contains an index of the files you need. 16. Programming applications which support the use of ISO 8859-1 For information on how to write applications with support for localization (to the ISO 8859-1 and other character representations) check out the file /pub/8bit/ISO-programming available via anonymous ftp from ftp.vlsivie.tuwien.ac.at. 17. Other relevant i18n FAQs This is a list of other FAQs on the net which might be of interest. Topic Newsgroup(s) Comments Nordic graphemes soc.culture.nordic interestings stuff about handling nordic letters accents sur Usenet soc.culture.french,... Accents on Usenet (French) + more Programming for I18N comp.unix.questions,... see section 16. International fonts ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-fonts Discusses international fonts and where to find them 18. Comments This FAQ is somewhat Sun-centered, though I have tried to include other machine types. If you have figured out how to configure your machine type, please let me (mike@vlsivie.tuwien.ac.at) know so that I can include it in future revisions of this FAQ. 19. Home location of this document The most recent version of this document is available via anonymous ftp from ftp.vlsivie.tuwien.ac.at under the file name /pub/8bit/FAQ-ISO-8859-1 ----------------- Copyright © 1994 Michael Gschwind (mike@vlsivie.tuwien.ac.at) This document may be copied for non-commercial purposes, provided this copyright notice appears. Dieses Dokument darf unter Angabe dieser urheberrechtlichen Bestimmungen zum Zwecke der nicht-kommerziellen Nutzung beliebig vervielfältigt werden. Michael Gschwind, Institut f. Technische Informatik, TU Wien snail: Treitlstrasse 3-182-2 || A-1040 Wien || Austria email: mike@vlsivie.tuwien.ac.at note: real time != real fast phone: +(43)(1)58801 8156 fax: +(43)(1)586 9697