next up previous contents
Next: Advanced Address Books Up: Using Mutt - Advanced Previous: Dealing with non-ASCII Character   Contents

Dealing With Particularly Troublesome Characters

Blame mutt for doing exactly what it's told to do. Character sets have very strict definitions of their character codes, and mutt takes those instructions literally, assuming other email software is telling the truth. But a lot of messages from Windows users either include Microsoft's extensions (like smart quotes and en-dashes) or include characters cut and pasted from other character sets but not reported. Some Microsoft email software, many web mail programs, and some news readers on various platforms report they are encoding characters in the ISO-8859-1 (Western Latin) character set but then proceed to use characters from the CP-1252 (Windows) character set. This causes you to read a lot extraneous characters in messages which Mutt presents to you in numerical form, like 221, 222, and 223. These numbers refer to octal codes for the characters, and appear when mutt doesn't know what to do with them. The following table shows some of the characters you are most likely to encounter:

Figure 10: Special Character Octal Codes
\begin{figure}\begin{tabular}{\vert l\vert l\vert}\hline
Octal & Character  \h...
... Right double quote  \hline
226 & En dash  \hline
\end{tabular}
\end{figure}

A relatively clean way of dealing with this problem is to add a character set hook to your .muttrc, as follows. This instructs mutt to present characters using one character set when an email message reports that it has been encoded in another.

charset-hook windows-1250 CP1250
charset-hook windows-1251 CP1251
charset-hook windows-1252 CP1252
charset-hook windows-1253 CP1253
charset-hook windows-1254 CP1254
charset-hook windows-1255 CP1255
charset-hook windows-1256 CP1256
charset-hook windows-1257 CP1257
charset-hook windows-1258 CP1258

Alain Bench recommends you follow up with the iconv program (read about it in the man pages), which converts text from one character set to another.

Mutt includes a mechanism for substituting one character for another in what's called a display filter. Add the following to your muttrc file to declare characters you'd like to substitute:

set display_filter="tr '\\221\\222\\223\\224\\226'
'\\047\\047\\042\\042\\055'"

This is a command that passes those characters through the Unix `tr' (translate) command, and I've had trouble with it on some mutt installations. Another recommendation is to install John Walker's demoronizer Perl script (see section 7)on your machine, and pass messages through the demoronizer as follows (for the record, I don't think this script works well if you are working in an UTF-8 encoded environment).

set display_filter="perl demoroniser.pl"

Take care to set the display filter to the actual path to where you've installed the demoroniser Perl script. I saved it in my home directory, so I had to use

set display_filter="perl ~/demoroniser.pl"


next up previous contents
Next: Advanced Address Books Up: Using Mutt - Advanced Previous: Dealing with non-ASCII Character   Contents
Randall Wood 2008-03-05