next up previous contents
Next: Advanced Printing: Mutt-Print Up: Advanced Customization Previous: Dealing with non-ASCII Character   Contents

Dealing With Particularly Troublesome Characters

Blame mutt for doing exactly what it's told to do. Character sets have very strict definitions of their character codes, and mutt takes those instructions literally, assuming other email software is telling the truth. But a lot of messages from Windows users either include Microsoft's extensions (like smart quotes and en-dashes) or include characters cut and pasted from other character sets but not reported. Some Microsoft email software, many web mail programs, and some news readers on various platforms report they are encoding characters in the ISO-8859-1 (Western Latin) character set but then proceed to use characters from the CP-1252 (Windows) character set. This causes you to read a lot extraneous characters in messages which Mutt presents to you in numerical form, like 221, 222, and 223. These numbers refer to octal codes for the characters, and appear when mutt doesn't know what to do with them. The following table shows some of the characters you are most likely to encounter:

Figure 11: Special Character Octal Codes
\begin{figure}\begin{tabular}{l\vert l}
Octal & Character \\ \hline
221 & Left s...
...
224 & Right double quote \\
226 & En dash \\ \hline
\end{tabular}
\end{figure}

A relatively clean way of dealing with this problem is to add a character set hook to your .muttrc, as follows. This instructs mutt to present characters using one character set when an email message reports that it has been encoded in another, as such: charset-hook windows-1250 CP1250. Repeat for CP1252 through CP1258, for example.

Alain Bench recommends you follow up with the iconv program (read about it in the man pages), which converts text from one character set to another.

Mutt includes a mechanism for substituting one character for another in what's called a display filter. Add the following to your muttrc file to declare characters you'd like to substitute:

set display_filter="tr '\\221\\222\\223\\224\\226'
'\\047\\047\\042\\042\\055'"

This is a command that passes those characters through the Unix `tr' (translate) command, and I've had trouble with it on some mutt installations. Another recommendation is to install John Walker's demoronizer Perl script (see section 5)on your machine, and pass messages through the demoronizer as follows (for the record, I don't think this script works well if you are working in an UTF-8 encoded environment).

set display_filter="perl demoroniser.pl"

Take care to set the display filter to the actual path to where you've installed the demoroniser Perl script. I saved it in my home directory, so I had to use

set display_filter="perl ~/demoroniser.pl"


next up previous contents
Next: Advanced Printing: Mutt-Print Up: Advanced Customization Previous: Dealing with non-ASCII Character   Contents
Randall Wood 2009-12-02