ASCII and Unicode quotation marks (2007)

Article URL: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html Comments URL: https://news.ycombinator.com/item?id=47395147 Points: 6 # Comments: 0

ASCII and Unicode quotation marks (2007)
ASCII and Unicode quotation marks (2007) Photo: Hacker News

Summary:Please do not use the ASCII grave accent (0x60) as
a left quotation mark together with the ASCII apostrophe (0x27) as the
corresponding right quotation mark (as in`quote').

Your
text will otherwise appear rather strange with most modern fonts
(e.g., on Windows and Mac systems).

Only old X Window System fonts and
some old video terminals show ASCII 0x60/0x27 as left and right
quotation marks, while most modern systems follow the ISO and Unicode
standards instead.

If you can use only ASCII’s typewriter characters,
then use the apostrophe character (0x27) as both the left and right
quotation mark (as in'quote').

If you can use Unicode
characters, nice directional quotation marks are available in the form
of characters U+2018, U+2019, U+201C, and U+201D (as in‘quote’or“quote”).

TheUnicodeand ISO 10646
standards define the following characters:
ASCII and ISO 8859 were only designed to support the very
restricted typographic style available to typewriter users.

The two
ASCII characters
are supposed to represent the neutral (vertical) glyphs commonly
used on typewriters.

They shouldnotbe used as
directional quotation marks.

ISO 8859 and Unicode fonts are supposed to show the two accent
characters
Unfortunately, the X Window System fonts contained for a long time
the following mutually symmetric glyphs:
These shapes were even sanctioned by an early US version of the ISO
646 standard (ANSI X3.4, also known as ASCII), which defined 0x27 as
“apostrophe (closing single quotation mark; acute accent)”, but they
should already have been changed when the fonts were extended to cover
ISO 8859-1, which added a separate acute accent at 0xB4.

One obviously
cannot have both 0x27/0x60 and 0x60/0xB4 as mutually symmetric glyph
pairs and have at the same time a different shape for 0x27 and 0xB4.

Since 0x60/0xB4 were defined to be accents by the modern standards,
their symmetric shape got priority, except that this had not been
fixed in the X fonts until 2004 (somewhat earlier in the versions that
come with XFree86).

The old X fonts encouraged some authors of Unix software and
documentation to abuse 0x60 together with 0x27 as directional
quotation marks.

This practice looked somewhat acceptable like
if displayed with old X fonts, but it looked rather ugly like
in most other modern display environments (e.g., with the correctly
designed Windows and Mac TrueType fonts, but also on many classic
1970s/1980s video terminals, such as those by Siemens/Nixdorf and many
other manufacturers).

For example, 0x60 and 0x27 look under Windows NT 4.0 with the
TrueType font Lucida Console (size 14) like this:
Unicode and ISO 10646 make a very clear distinction between the
undirected typewriter-style ASCII single quotation mark and apostrophe
U+0027 as in
and the typographic directed quotation marks U+2018 and U+2019 as
in
Unicode 2.1explicitly says that U+2019 is the preferred
punctuation apostrophe, as in “We’ve been here before.”
The Unicode standard also notes:
“For historical reasons, U+0027 is a particularly
overloaded character.

In ASCII it is used to represent a punctuation
mark (such as right single quotation mark, left single quotation mark,
apostrophe punctuation, vertical line, or prime) or a modifier letter
(such as apostrophe modifier or acute accent.) (Punctuation marks
generally break words; modifier letters generally are considered part
of a word.) In many systems it is always represented as a straight
vertical line and can never represent a curly apostrophe or right
quotation mark.”
If you are the author of some Unix software, then please check,
whether you use the ASCII character 0x60 (`) as a left
quotation mark as in`quote'.

Change it such that you use
instead the character 0x27 (') on both sides, as in'quote'.

If you work in an environment where the UTF-8
encoding is already used everywhere (e.g., Plan9 and most modern
GNU/Linux installations), you could even decide to use proper
directional quotation marks, as in‘quote’or“quote”.

Check your source code directories with
to find out, where modifications are necessary.

Then use (with
proper care!) something like
to make the necessary substitutions automatically, or make the
edits manually instead.

The use of 0x60 (grave accent) as a special control character in
the Unix shell (to denote command substitution as in`command`or better$(command)), in Perl, in
Lisp, or in TeX/troff (to denote a proper left single quotation mark)
does not have to be changed and remains unaffected.

Donald Knuth’sTeXbook(chapter 2, page 3, end of second paragraph) has actually warned TeX
users already since 1986 that the apostrophe and grave accent shapes
can show up as required by ISO and Unicode and not as used in the rest
of the TeXbook.

The Unix m4 macro processor is probably the only
widely used tool that uses the `quote' combination as part of its
input syntax; however, even that could be modified viachangequote.

There are quite a number of reasons, why the old X fonts had to be
fixed, and with them the associated ASCII backquote practice:
It can cause quite some confusion for users, if the keycap labels
and the glyph shapes in the fonts disagree, as they did in the old X
fonts.

Updated X Window System core BDF
fontshave been available since 1998, in which the apostrophe and
grave accent are now corrected, along with a number of other bugs.

They replaced the old fonts in XFree86 since version 4.0 and in the
X.Org sample implementation since X11R6.8.

PostScript has a somewhat complicated history of how it maps the
ASCII bytes to glyphs.

In PostScript fonts, each glyph is identified
not by a code position, but by aglyph namesuch as
“quotesingle”.

After the publication of the Unicode Standard, Adobe
released an officialPostScript
Glyph Name to Unicode Mappingtable.

When a PostScript interpreter
displays text, it uses anencoding vectorto map the 8-bit
byte values found in text strings onto the glyph names found in fonts.

PostScript provides several predefined 8-bit encoding vectors.

Authors of printer drivers can easily add their own.

As the above
table shows, the originalPostScript
standard encodingfollowed a practice similar to the old X fonts,
with all its problems, namely it mapped the ASCII bytes 0x60 and 0x27
to curly opening and closing quotation marks (“quoteleft” and
“quoteright” in PostScript glyph-name terminology, or U+2018 and
U+2019 in Unicode).

When ISO 8859-1 emerged, Adobe added to PostScript another
predefined encoding vector calledISOLatin1Encoding.

This
was meant to be ISO 8859-1 compatible, but it remained at 0x60 and
0x27 unchanged from the oldStandardEncodingvector, and
therefore it does not actually print the ISO 8859-1 characters 0x27
and 0x60 correctly, which correspond to Unicode characters U+0027 and
U+0060 and should be represented by the PostScript glyphs “grave” and
“quotesingle”.

The authors of Adobe’sPostScript
Language Reference, Third Edition(Addison-Wesley, ISBN
0-201-37922-8) acknowledge this in section E.5, footnote 3, page 783,
where they note that the “ISOLatin1Encodingencoding
vector deviates from the ISO 8859-1 standard” and that an application
that wants to “conform exactly to the ISO standard should create a
modified encoding vector”.

The newer CE encoding vector (Central
European, matching Windows CP1250), which is now also described in the
PostScript Language Reference, correctly maps 0x27 to “quotesingle”
and 0x60 to “grave”.

If you write a PostScript driver, please use the officialUnicode
to PostScript mapping tableto map ASCII, ISO 8859 and ISO 10646
characters to PostScript glyphs, as the updated Type 1 renderer in
XFree86 4.0 does.

Do not use theISOLatin1Encodingencoding vector to print ISO 8859-1 text, without changing it first to
map 0x27 to “quotesingle” and 0x60 to “grave”.

(In addition, you may
also want to map 0x2D = HYPHEN-MINUS to the PostScript glyph “hyphen”
instead of the “minus” mapping used byISOLatin1Encoding).

The fontcmtt10in TeX’s Computer Modern family
follows the example of the PostScript standard encoding by providing a
straight double quotation mark and directional single quotation marks
on the ASCII positions 0x22, 0x60, and 0x27.

It also provides a
straight single quotation mark, grave accent, and acute accent on code
positions 0x0d, 0x12, and 0x13, respectively, but it lacks directional
double quotation marks:
Therefore, to demonstrate the result of abusing ASCII’s straight
quotation mark and graph accent as directional quotation marks in a
document written in LaTeX, you can write\texttt{\char"12
quote\char"0D}.

The non-typewriter fonts in Computer Modern
lack both single and double straight quotation marks.

Use LaTeX’supquote package(\usepackage{upquote})
to map in the verbatim modes the ASCII characters 0x27 and 0x60 to the
correct glyphs.

created 1999-12-19 – last modified 2007-12-11 –
http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Source: This article was originally published by Hacker News

Read Full Original Article →

Share this article

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

Maximum 2000 characters