文字, 実体, 書式(Mathematical Markup Language (MathML) Version 3.0 2nd Edition 7 Characters, Entities and Fonts 日本語訳)

7.1 導入
Introduction

表記法や記号は, 数学においてとても重要だと判明してきています. 数学はこの分野で成長してきました. なぜなら, 数学の表記法が簡潔で暗示的であるように継続的に変わってきたからです. 数学表記に利用するためにたくさんの新しい記号が開発されてきており, 記号の多くは最初に導入されたところとは違う場所でも使われています. その結果, 数学はとても大きな記号の集合を利用しています. これらの記号が利用できなかったら, すらすらと数学を記述することは困難です. 特定の表示装置で記号に対応する字形が表示できなければ, 数学を読むことは困難です.

Notation and symbols have proved very important for mathematics. Mathematics has grown in part because its notation continually changes toward being succinct and suggestive. Many new signs have been developed for use in mathematical notation, and many have been adopted that were originally introduced elsewhere.The result is that mathematics makes use of a very large collection of symbols. It is difficult to write mathematics fluently if these characters are not available for use. It is difficult to read mathematics if corresponding glyphs are not available for presentation on specific display devices.

そのため, W3C数学作業部会は, 表記法から最終的な表示を生じさせるのに必要な仕組みを仕様に明記する作業を行っています. また, ユニコード技術委員会(UTC)や仕様書の仕上げに着手しているSTIXフォントプロジェクトと協力していきます.

The W3C Math Working Group therefore took on the job of specifying part of the mechanism needed to proceed from notation to final presentation, and has collaborated with the Unicode Technical Committee (UTC) and the STIX Fonts Project in undertaking specification of the rest.

この章はMathMLで使用される文字についての議論やそれらの利用についての仕様を含んでいます. また, ユニコード[ユニコード]で体系化された世界的な複数オクテットコードによる文字集合(UCS)[ISO10646]で示されるコードポイントに対応する正確な形に関する警告を含んでいます. 単純に, この仕様書ではこれらの文字をユニコードの短い名前で参照しています. 他で述べられてない限り, この章やMathML3.0仕様書内の他の箇所で論じられている紐付けは, ユニコード5.2に基づいています. MathMLに準拠したソフトウェア(第2.3節適合参照)は, ユニコード5.2以降の定義された文字を利用できます.

This chapter contains discussion of characters for use within MathML, recommendations for their use, and warnings concerning the correct form of the corresponding code points given in the Universal Multiple-Octet Coded Character Set (UCS) [ISO10646] as codified in Unicode [Unicode]. For simplicity we refer to this character set by the short name Unicode. Unless otherwise stated, the mappings discussed in this chapter and elsewhere in the MathML 3.0 recommendation are based on Unicode 5.2. Conformant MathML processors (see Section 2.3 Conformance) are free to use characters defined in Unicode 5.2 or later.

UTCやISO/IECによる数学やMathMLに特に関係のある文字の確認や採用の長い工程は完了したにも関わらず, 将来, もっとたくさんの文字が加えられるかもしれません. 最新の文字の一覧表や書体の情報については, [実体]やユニコードのホームページ, 特に進行中のユニコードの作業や Unicode Technical Report #25 “Unicode Support for Mathematics”(訳注:"ユニコード技術レポート#25 “数学のためのユニコードの実装”"の意味)を参照してください.

While a long process of review and adoption by UTC and ISO/IEC of the characters of special interest to mathematics and MathML is now complete, more characters may be added in the future. For the latest character tables and font information, see the [Entities] and the Unicode Home Page, notably Unicode Work in Progress and Unicode Technical Report #25 “Unicode Support for Mathematics”.

MathMLの素子要素(第3.2節素子要素, 第4.2.1節数字 <cn>, 第4.2.2節コンテントマークアップの識別子 <ci>, 第4.2.3節コンテントマークアップの記号 <csymbol>参照)は, MathML文字かmglyph要素を含みます. 後者は, ユニコードのコードポイントを持たない文字を表示するのに利用され, 第3.2.1.2節記号の表示に画像を使用する<mglyph/>で説明しています. ユニコード3.1が数学のためにおよそ1千の英数字を提供し, ユニコード3.2が900以上の特別な数学記号を加えたことから, mglyphが必要になることは稀です.

A MathML token element (see Section 3.2 Token Elements, Section 4.2.1 Numbers <cn>, Section 4.2.2 Content Identifiers <ci>, Section 4.2.3 Content Symbols <csymbol>) takes as content a sequence of MathML characters or mglyph elements. The latter are used to represent characters that do not have a Unicode encoding, as described in Section 3.2.1.2 Using images to represent symbols <mglyph/>. The need for mglyph should be rare because Unicode 3.1 provided approximately one thousand alphabetic characters for mathematics, and Unicode 3.2 added over 900 more special mathematical symbols.

7.2 ユニコード文字データ
Unicode Character Data

XMLで利用できる文字はどれもMathMLで利用できます. より正確に言うと, 正当なユニコード文字は16進数で09 (タブ = U+0009), 0A (改行 = U+000A), 0D (復帰 = U+000D), 20-D7FF (U+0020..U+D7FF), E000-FFFD (U+E000..U+FFFD), 10000-10FFFF (U+10000..U+10FFFF)の範囲のコードを持つものです. 除外されているD7FF以上の数値は, サロゲートペアとして使用される範囲と絶対にユニコードでないことが保証されている2つの文字です. U+FFFEは, コード化のバイト順を決定するために除外されています.

Any character allowed by XML may be used in MathML. More precisely, the legal Unicode characters have the hexadecimal code numbers 09 (tab = U+0009), 0A (line feed = U+000A), 0D (carriage return = U+000D), 20-D7FF (U+0020..U+D7FF), E000-FFFD (U+E000..U+FFFD), and 10000-10FFFF (U+10000..U+10FFFF). The exclusions above code number D7FF are of the blocks used in surrogate pairs, and the two characters guaranteed not to be Unicode characters at all. U+FFFE is excluded to allow determination of byte order in certain encodings.

XML文書で文字データをコード化するには, 本質的に3つの異なる方法があります.

There are essentially three different ways of encoding character data in an XML document.

文字を直接利用. 例えば, 'é' (文字 U+00E9 [アキュート付きラテン小文字e])が挿入されるかもしれません. この方法は, XML文書で指定されている文字コードが意図している文字を含んでいる場合しか利用できないことに注意が必要です. 例えば, 文書がラテン1 (ISO-8859-1)で書かれていた場合のみ, この文字は直接書くことが可能です. また, その場合, U+00E9 (アキュート付きe)は直接書けますが, 文字U+03B1 (アルファ)は直接書けません.

Using characters directly: For example, the 'é' (character U+00E9 [LATIN SMALL LETTER E WITH ACUTE]) may have been inserted. This option is only useful if the character encoding specified for the XML document includes the character intended. Note that if the document is, for example, encoded in Latin-1 (ISO-8859-1) then only the characters in that encoding are available directly; for instance character U+00E9 (eacute) is, but character U+03B1 (alpha) is not.
XML数値文字参照を利用. 例えば'é'はé (10進数) または é (16進数), もしくは é (10進数) または é と表すことができるでしょう. 文字参照の数値は, (XMLファイルで使用されている文字コードではなく)常にユニコードの文字コードを参照していることに注意が必要です. 文字参照を利用する場合, ユニコード全体の範囲を利用することが可能です.

Using numeric XML character references: For example, 'é' may be represented as é (decimal) or é (hex), or é (decimal) or é. Note that the numbers in the character references always refer to the Unicode encoding (and not to the character encoding used in the XML file). By using character references it is always possible to access the entire Unicode range.
実体参照を利用. MathML DTDは, 文字データに展開される内部実体を定義しています. そのため, 例えば, 実体参照éは文字参照éより利用しやすいでしょう. DTDに定義されていない実体参照を使用するXML部分は整形式ではないです. そのため, XMLパーサーによって取り除かれます. この理由から, 各々の実体参照を利用する部分は, MathML DTD, もしくは少なくともMathMLの部分で利用する実体参照を宣言しているDTDを指定しているDOCTYPE宣言を使用しなければなりません. DOCTYPE宣言を使用しなければならないことは, 文書にMathMLを含むことを複雑にしています. しかしながら, 実体参照は簡単な説明しやすい例として使い勝手が良いです.

Using entity references: The MathML DTD defines internal entities that expand to character data. Thus for example the entity reference é may be used rather than the character reference é. An XML fragment that uses an entity reference which is not defined in a DTD is not well-formed; therefore it will be rejected by an XML parser. For this reason every fragment using entity references must use a DOCTYPE declaration which specifies the MathML DTD, or a DTD that at least declares any entity reference used in the MathML instance. The need to use a DOCTYPE complicates inclusion of MathML in some documents. However, entity references can be useful for small illustrative examples.

7.3 実体宣言
Entity Declarations

MathML仕様書の初期のバージョンは, MathML DTDと一緒に使用する実体定義の詳細な一覧を含んでいました. これらの実体定義はより広く利用されており, 現在では付随する文書である文字に対するXML実体の定義[実体]として分割しています. その文書の表で, 実体名と対応するユニコード文字参照を一覧で示しています. その文書ではいろいろな文字セットについて説明していますが, その全てのセットがMathML DTDで利用されている訳ではありません. MathML DTDは, [実体]で定義されている合成された HTML MathML 実体セットを参照しています.

Earlier versions of this MathML specification included detailed listings of the entity definitions to be used with the MathML DTD. These entity definitions are of more general use, and have now been separated into an ancillary document, XML Entity Definitions for Characters [Entities]. The tables there list the entity names and the corresponding Unicode character references. That document describes several entity sets; not all of them are used in the MathML DTD. The MathML DTD references the combined HTML MathML entity set defined in [Entities].

7.4 ユニコードに含まれない特殊文字
Special Characters Not in Unicode

特別な理由で, ユニコードにない記号を使用することが必要になるかもしれません. その場合, 画像の字形, または(いくつかの環境で)ユニコードでない文字コードで利用するフォントを直接使用するmglyph要素を使用できるでしょう. 全てのMathML素子要素で, その内容としてmglyphを含む文字列を利用可能です. しかしながら, mglyphの利用によるフォントへの参照が不当とされたり, mglyphの仕組みが全てのシステムで動くとは限らないことに注意が必要です. mglyph要素は, alt属性によって利用可能な代わりの表現を常に提供すべきです.

For special purposes, one may need a symbol which does not have a Unicode representation. In these cases one may use the mglyph element for direct access to a glyph as an image, or (in some systems) from a font that uses a non-Unicode encoding. All MathML token elements accept characters in their content and also accept an mglyph there. Beware, however, that use of mglyph to access a font is deprecated and the mechanism may not work in all systems. The mglyph element should always supply a useful alternative representation in its alt attribute.

7.5 数学用英数字記号
Mathematical Alphanumeric Symbols

数学や科学で文章を書くとき, 単一の文字は与えられた文脈の中で変数や定数をだいたい表しています. 科学の複雑さの増加は, 文字のような特別な記号を十分に提供するために, 一般的な文字や書式の変化を使用するに至っています. これらの表現では, 文字列が意味に覚えのある単語を形作っているのではなく, 個々の文字が意味を持っています. そのような記号の文字列を書くことは通常, 表現方法の観点から例えば掛け算等と解釈されます. 文字のような記号は, ある数学の分野の専門家から, その分野のものとして素早く解釈されるでしょう. 例えば, ラテン文字やギリシア文字を基にした太字は物理や工学のベクトルとして, フラクタルの記号は純粋数学の分野のリー代数として解釈されるでしょう.

In mathematical and scientific writing, single letters often denote variables and constants in a given context. The increasing complexity of science has led to the use of certain common alphabet and font variations to provide enough special symbols of this letter-like type. These denotations are generally not letters that may be used to make up words with recognized meanings, but individual carriers of semantics themselves. Writing a string of such symbols is usually interpreted in terms of some composition law, for instance, multiplication. Many letter-like symbols may be quickly interpreted as of a certain mathematical type by specialists in a given area: for instance, bold symbols, whether based on Latin or Greek letters, as vectors in physics or engineering, or Fraktur symbols as Lie algebras in part of pure mathematics.

ユニコード3.1で提供された追加の数学用英数字記号は, 第1面のU+1D400からU+1D7FFの範囲のコードポイントを持っています。その範囲は, 2¹⁶以上の値を持つユニコードの面です. この文字の面は, 元々のユニコード全体の範囲であった基本多言語面(BMP)とは対照的に, 追加多言語面(SMP)として知れれています. 一般に普及しているソフトウェアにおいて, 第1面の文字への対応はいつも確実という訳ではありません. しかし, 第2面が東アジア地域で表示できなければならない多くの漢字を含んでいる以上, 多言語の基本ソフトにおいては対応しているべきです.

The additional Mathematical Alphanumeric Symbols provided in Unicode 3.1 have code points in the range U+1D400 to U+1D7FF in Plane 1, that is, in the first plane with Unicode values higher than 2¹⁶. This plane of characters is also known as the Secondary Multilingual Plane (SMP), in contrast to the Basic Multilingual Plane (BMP) which was originally the entire extent of Unicode. Support for Plane 1 characters in currently deployed software is not always reliable, but it should be possible in multilingual operating systems, since Plane 2 has many Chinese characters that must be displayable in East Asian locales.

第3.2.2節素子要素に共通の数学書式属性で論じたとおり, MathMLは, 数学用英数字記号を指定する代わりの仕組みを用意するべきです. この代わりの仕組みは, ユニコードのコードポイントとしての数学用英数字記号の仕様と, それらに対応するソフトウェアやフォントの普及との隔たりを橋渡しします. すなわち, miのような素子要素でmathvariant属性を使用することで, 素子要素の文字データが数学用英数字記号であることを示すことができます.

As discussed in Section 3.2.2 Mathematics style attributes common to token elements, MathML offers an alternative mechanism to specify mathematical alphanumeric characters. This alternative mechanism spans the gap between the specification of the mathematical alphanumeric symbols as Unicode code points, and the deployment of software and fonts that support them. Namely, one uses the mathvariant attribute on a token element such as mi to indicate that the character data in the token element selects a mathematical alphanumeric symbol.

原則として, 特定の記号素子を定義するのに, どのmathvariantの値でも, どの文字データとも一緒に使用することができます. 実際上は, 文字データとmathvariantの値の特定の組合せのみ, 提供されている表示ソフトウェアによって視覚的に区別されます. この節では, mathvariant属性によって修飾された際に, 数学用英数字記号と同等と見なされる第0面の文字について説明します.

In principle, any mathvariant value may be used with any character data to define a specific symbolic token. In practice, only certain combinations of character data and mathvariant values will be visually distinguished by a given renderer. In this section we explain the correspondence between certain characters in Plane 0 that, when modified by the mathvariant attribute, are considered equivalent to mathematical alphanumeric symbol characters.

第1面の数学用英数字記号は, 点のないiとjを含めた大文字・小文字のラテン文字, 大文字・小文字のディガンマを含む大文字・小文字のギリシア文字, (異体字として知られる)ギリシア文字記号, ラテン文字の数字を含んでいます. これらの文字は, 対応する第0面の文字とは異なった第1面のユニコードコードポイントを持っており, 数式で利用する際に数学上の意味を表すために書式のみが変化しています.

The mathematical alphanumeric symbol characters in Plane 1 include alphabets for Latin upper-case and lower-case letters, including dotless i and j, Greek upper-case and lower-case letters, Greek symbols (also known as variants), including upper-case and lower-case digamma, and Latin digits. These alphabets provide Plane 1 Unicode code points that differ from corresponding Plane 0 characters only by a variation in font that carries mathematical semantics when used in a formula.

mathvariant属性は, 第1面の文字に対応する代わりの記述法を提供するために, これらの文字の第0面と第1面の対応を利用します. 例えば, 数学用斜体文字はU+1D434 ("A")からU+1D467 ("z")の範囲です. よって, 典型的な変数の識別子は次のように記述されます.

The mathvariant attribute uses exactly this correspondence to provide an alternate markup encoding that selects these Plane 1 characters. For example, the Mathematical Italic alphabet runs from U+1D434 ("A") to U+1D467 ("z"). Thus, a typical example of an identifier for a variable, marked up as

<mi>a</mi>

そして, (第3.2.3節識別子<mi>で述べたとおり)数学用斜体フォントで表現され, 次のように記述しても同じことになります.

and rendered in a mathematical italic font (as described in Section 3.2.3 Identifier <mi>) could equivalently be marked up as

<mi>&#x1D44E;<!--MATHEMATICAL ITALIC SMALL A--><!--MATHEMATICAL ITALIC SMALL A--></mi>

これは, 明示的に数学用斜体の小文字を呼び出しています.

which invokes the Mathematical Italic lower-case a explicitly.

第1面の数学用英数字記号の重要な用途は, フラクタル, ギリシア文字, 太字, スクリプトといった特別な数学用フォントで識別子を普通に表示することです. 他の例では, 数学用フラクタル文字はU+1D504 ("A")からU+1D537 ("z")までの範囲です. よって, 変数の識別子でフラクタル文字を使用する場合, 次のように記述します.

An important use of the mathematical alphanumeric symbols in Plane 1 is for identifiers normally printed in special mathematical fonts, such as Fraktur, Greek, Boldface, or Script. As another example, the Mathematical Fraktur alphabet runs from U+1D504 ("A") to U+1D537 ("z"). Thus, an identifier for a variable that uses Fraktur characters could be marked up as

<mi>&#x1D504;<!--MATHEMATICAL FRAKTUR CAPITAL A--><!--BLACK-LETTER CAPITAL A--></mi>

代わりの, この例と等しい結果になる記述方法として, 普通の大文字Aを使用してmathvariantで修飾する方法もあります.

An alternative, equivalent markup for this example is to use the common upper-case A, modified by using the mathvariant attribute:

<mi mathvariant="fraktur">A</mi>

MathMLを処理するソフトウェアは, 数学用英数字記号を(それが現れたとき), 対応する書式のない文字とmathvariant属性の値との組合せと同一として扱わなければなりません. mathvariant属性は文字の意味の種類を指定していることに注意することが重要です. それらの文字はそれぞれ, 文書全体の書式が変わっても保護されるべき特定の外観を持っており, そのことで文字が意図している意味は維持されます. 数学用英数字記号を利用することは, 指定された外観を維持することを意図しているので, これらの文字は周囲の書式の変更の影響を受けません.

A MathML processor must treat a mathematical alphanumeric character (when it appears) as identical to the corresponding combination of the unstyled character and mathvariant attribute value. It is important to note that the mathvariant attribute specifies a semantic class of characters, each of which has a specific appearance that should be protected from document-wide style changes, so the intended meaning of the character may be preserved. The use of a mathematical alphanumeric character is also intended to preserve this specific appearance, and so these characters are also not to be affected by surrounding style changes.

文字データとmathvariantの値の全ての組合せが, ユニコードコードポイントに割り当てられている訳ではありません. 例えば, 太字のゴシック体のギリシア文字は割り当てられているのに対して, ゴシック体のギリシア文字は省略されています. また, 太字の斜体の数字は除外されているのに対して, 太字の数字は割り当てられています. 表示ソフトウェアは, これらの文字データとmathvariant属性の組合せを, フォントが利用可能であるなら視覚的に表現すべきです. これは, 表示ソフトウェアはユニコードコードポイントと等しい最低限の組合せを表示することを意味しており, ユニコードコードポイントを持たず, 十分なフォントの対応もされていない組合せを無視しても良いことを意味しています.

Not all combinations of character data and mathvariant values have assigned Unicode code points. For example, sans-serif Greek alphabets are omitted, while bold sans-serif Greek alphabets are included, and bold digits are included, while bold-italic digits are excluded. A renderer should visually distinguish those combinations of character data and mathvariant attribute values that it can subject to the availability of font characters. It is intended that renderers distinghish at least those combinations that have equivalent Unicode code points, and renderers are free to ignore those combinations that have no assigned Unicode code point or for which adequate font support is unavailable.

ある特定の文字が既にユニコード第0面に存在し, 第1面で'期待されている'場所にないという事実が, 数学用英数字記号と書式のない文字の厳密な対応を複雑にしています. 下記の表は第0面の数学用英数字記号を示しており, それぞれの文字について, ユニコードコードポイント, ユニコード名, 対応する書式のない文字, その文字が自然に第1面に置かれる場合に要求されてきたコードポイントを一覧にしています.

The exact correspondence between a mathematical alphabetic character and an unstyled character is complicated by the fact that certain characters that were already present in Unicode in Plane 0 are not in the 'expected' sequence in Plane 1. The table below shows the Plane 0 mathematical alphanumeric symbols, listing for each character its Unicode code point, its Unicode character name, its corresponding unstyled alphabetic character, and the code point in Plane 1 where one might naturally have sought this character.

ユニコードコードポイント Unicode code point	ユニコード名 Unicode name	基本多言語面コード BMP code	第1面コード Plane-1 code
U+210E	プランク定数 PLANCK CONSTANT	U+0068	U+1D455
U+212C	スクリプトの大文字B SCRIPT CAPITAL B	U+0042	U+1D49D
U+2130	スクリプトの大文字E SCRIPT CAPITAL E	U+0045	U+1D4A0
U+2131	スクリプトの大文字F SCRIPT CAPITAL F	U+0046	U+1D4A1
U+210B	スクリプトの大文字H SCRIPT CAPITAL H	U+0048	U+1D4A3
U+2110	スクリプトの大文字I SCRIPT CAPITAL I	U+0049	U+1D4A4
U+2112	スクリプトの大文字L SCRIPT CAPITAL L	U+004C	U+1D4A7
U+2133	スクリプトの大文字M SCRIPT CAPITAL M	U+004D	U+1D4A8
U+211B	スクリプトの大文字R SCRIPT CAPITAL R	U+0052	U+1D4AD
U+212F	スクリプトの小文字e SCRIPT SMALL E	U+0065	U+1D4BA
U+210A	スクリプトの小文字g SCRIPT SMALL G	U+0067	U+1D4BC
U+2134	スクリプトの小文字o SCRIPT SMALL O	U+006F	U+1D4C4
U+212D	ブラックレターの大文字C BLACK-LETTER CAPITAL C	U+0043	U+1D506
U+210C	ブラックレターの大文字H BLACK-LETTER CAPITAL H	U+0048	U+1D50B
U+2111	ブラックレターの大文字I BLACK-LETTER CAPITAL I	U+0049	U+1D50C
U+211C	ブラックレターの大文字R BLACK-LETTER CAPITAL R	U+0052	U+1D515
U+2128	ブラックレターの大文字Z BLACK-LETTER CAPITAL Z	U+005A	U+1D51D
U+2102	二重線の大文字C DOUBLE-STRUCK CAPITAL C	U+0043	U+1D53A
U+210D	二重線の大文字H DOUBLE-STRUCK CAPITAL H	U+0048	U+1D53F
U+2115	二重線の大文字N DOUBLE-STRUCK CAPITAL N	U+004E	U+1D545
U+2119	二重線の大文字P DOUBLE-STRUCK CAPITAL P	U+0050	U+1D547
U+211A	二重線の大文字Q DOUBLE-STRUCK CAPITAL Q	U+0051	U+1D548
U+211D	二重線の大文字R DOUBLE-STRUCK CAPITAL R	U+0052	U+1D549
U+2124	二重線の大文字Z DOUBLE-STRUCK CAPITAL Z	U+005A	U+1D551

数学用英数字記号は, 単に書式のために用いられるべきではありません. 例えば, 数学用フラクタルAは, 検索, 表現(例えばアクセシビリティ)やたくさんの他の種類の処理に問題が生じるので, 単に大文字Aにブラックレターフォントを選ぶために用いるべきではありません.

Mathematical Alphanumeric Symbol characters should not be used for styled prose. For example, Mathematical Fraktur A must not be used to just select a blackletter font for an uppercase A as it would create problems for searching, restyling (e.g. for accessibility), and many other kinds of processing.

7.6 表示されない文字
Non-Marking Characters

印刷の品質や代用の表示に重要であるにも関わらず, 直接対応する字形を持たない文字があります. それらをここでは表示されない文字と呼んでいます. それらの文字の役割は, 第3章プレゼンテーションマークアップと第4章コンテントマークアップで論じています.

Some characters, although important for the quality of print or alternative rendering, do not have glyph marks that correspond directly to them. They are called here non-marking characters. Their roles are discussed in Chapter 3 Presentation Markup and Chapter 4 Content Markup.

MathMLにおいて, 改行のようなページの構成の制御は, moやmspace要素の適切な属性を利用して行います.

In MathML, control of page composition, such as line-breaking, is effected by the use of the proper attributes on the mo and mspace elements.

下記の文字は単純な空白ではありません. これらの文字は, 表示の表現の質を向上させるための文字の手がかりを提供したり, 聴覚表現を正確にできるようにしたり, 視覚的にあいまいな文章から数学の意味を唯一のものとして再現したりできることから, UCSに特に重要として新しく追加された文字です.

The characters below are not simple spacers. They are especially important new additions to the UCS because they provide textual clues which can increase the quality of print rendering, permit correct audio rendering, and allow the unique recovery of mathematical semantics from text which is visually ambiguous.

ユニコードコードポイント Unicode code point	ユニコード名 Unicode name	説明 Description
U+2061	関数の適用 FUNCTION APPLICATION	プレゼンテーションマークアップにおいて関数の適用を表す文字(第3.2.5節演算子, かっこ, 区切り, アクセント `<mo>`) character showing function application in presentation tagging (Section 3.2.5 Operator, Fence, Separator or Accent `<mo>`)
U+2062	見えない掛ける INVISIBLE TIMES	記号なしで理解される場合の掛け算の記号(第3.2.5節演算子, かっこ, 区切り, アクセント `<mo>`) marks multiplication when it is understood without a mark (Section 3.2.5 Operator, Fence, Separator or Accent `<mo>`)
U+2063	見えない区切り記号 INVISIBLE SEPARATOR	区切りとして利用される, 例えば添え字(第3.2.5節演算子, かっこ, 区切り, アクセント `<mo>`) used as a separator, e.g., in indices (Section 3.2.5 Operator, Fence, Separator or Accent `<mo>`)
U+2064	見えないプラス INVISIBLE PLUS	特別に1½のような構成において使用される足し算の記号(第3.2.5節演算子, かっこ, 区切り, アクセント `<mo>`) marks addition, especially in constructs such as 1½ (Section 3.2.5 Operator, Fence, Separator or Accent `<mo>`)

7.7 変則的な数学用文字
Anomalous Mathematical Characters

数学の文脈で同じと見なされたり, 特別な意味を持っていたりする文字は, UCSの他の文字とよく混同されます. 例えば, 普通にキーボードで入力される文字が, よりふさわしい数学用文字の代用として確立されてきたりしています. 他にも, 数式と文章の両方で合理的用途のある文字には, 矛盾した表現や書式の慣習があったりします. これらの文字は全て変則的な数学用文字と呼ばれます.

Some characters which occur fairly often in mathematical texts, and have special significance there, are frequently confused with other similar characters in the UCS. In some cases, common keyboard characters have become entrenched as alternatives to the more appropriate mathematical characters. In others, characters have legitimate uses in both formulas and text, but conflicting rendering and font conventions. All these characters are called here anomalous characters.

7.7.1 キーボードで入力される文字
Keyboard Characters

典型的なラテン1をもとにしたキーボードは, 重要な数学用文字と視覚的に似た文字をいくつか含んでいます. その結果, これらの文字は, 故意にまたは故意でなく, そっくりより正確な数学用文字の代わりに使われます.

Typical Latin-1-based keyboards contain several characters that are visually similar to important mathematical characters. Consequently, these characters are frequently substituted, intentionally or unintentionally, for their more correct mathematical counterparts.

7.7.1.1 マイナス
Minus

特別な数学の用途を持っている最も一般的な普通の文字はU+002D [ハイフン-マイナス]です. そのユニコード名が暗示しているように, 文章の中ではハイフンとして使われ, 数式の中では負の記号として使われます. 文章での利用には, 単に文章中での利用を意図する特定のコードポイントU+2010 [ハイフン]があり, ハイフンまたは短いダッシュとして表示されるべきです. 数学での利用には, 数式中での利用を意図する他のコードポイントU+2212 [マイナス記号]があり, 長いマイナスまたは負の記号として表示されるべきです. MathMLを表示するソフトウェアは, U+002D [ハイフン-マイナス]を, moのような数式の中ではU+2212 [マイナス記号]と等しいものとして, mtextのような文章の中ではU+2010 [ハイフン]と等しいものとして扱うべきです.

The most common ordinary text character which enjoys a special mathematical use is U+002D [HYPHEN-MINUS]. As its Unicode name suggests, it is used as a hyphen in prose contexts, and as a minus or negative sign in formulas. For text use, there is a specific code point U+2010 [HYPHEN] which is intended for prose contexts, and which should render as a hyphen or short dash. For mathematical use, there is another code point U+2212 [MINUS SIGN] which is intended for mathematical formulas, and which should render as a longer minus or negative sign. MathML renderers should treat U+002D [HYPHEN-MINUS] as equivalent to U+2212 [MINUS SIGN] in formula contexts such as mo, and as equivalent to U+2010 [HYPHEN] in text contexts such as mtext.

7.7.1.2 アポストロフィ, 引用符, プライム
Apostrophes, Quotes and Primes

典型的なヨーロッパのキーボードでは, アポストロフィまたは引用符(直立した, もしくは右側の引用符)に見えるキーが利用できます. そのため, 1つのキーがU+0027 [アポストロフィ]とU+2019 [右の一重引用符]を入力する二重の役割をします. 数学の文脈では, そのキーはU+2032 [プライム]であるべきプライムとして一般に使われます. ユニコードは, この記号を上書きして分やフィートの単位を表すのに記述することを認めています. 普通の文脈で構造化されていない表示においては, それらの文字は他の文字に続いて書かれます. U+0027 [アポストロフィ]とU+2019 [右の一重引用符]は, 文の中央から上にあげた小さい字形で記述されます. 使用されるフォントは, ユニコードの索引により, ふさわしい場所に上げた小さい字形を提供します. 数学のU+2032 [プライム]は, 全高のユニコードフォントと同じように扱われます.

On a typical European keyboard there is a key available which is viewed as an apostrophe or a single quotation mark (an upright or right quotation mark). Thus one key is doing double duty for prose input to enter U+0027 [APOSTROPHE] and U+2019 [RIGHT SINGLE QUOTATION MARK]. In mathematical contexts it is also commonly used for the prime, which should be U+2032 [PRIME]. Unicode recognizes the overloading of this symbol and remarks that it can also signify the units of minutes or feet. In the unstructured printed text of normal prose the characters are placed next to one another. The U+0027 [APOSTROPHE] and U+2019 [RIGHT SINGLE QUOTATION MARK] are marked with glyphs that are small and raised with respect to the center line of the text. The fonts used provide small raised glyphs in the appropriate places indexed by the Unicode codes. The U+2032 [PRIME] of mathematics is similarly treated in fuller Unicode fonts.

MathML表示ソフトウェアは, 数式の中で現れた場合にU+0027 [アポストロフィ]をU+2032 [プライム]として扱うように促されています.

MathML renderers are encouraged to treat U+0027 [APOSTROPHE] as U+2032 [PRIME] when appropriate in formula contexts.

最後の注意として, ‘プライム’はよく, キリル文字U+044C [キリル文字の小文字の軟音記号]の音訳で使われることがあります. このプライムの変わった使い方は, 数式において考慮すべき点ではありません.

A final remark is that a ‘prime’ is often used in transliteration of the Cyrillic character U+044C [CYRILLIC SMALL LETTER SOFT SIGN]. This different use of primes is not part of considerations for mathematical formulas.

7.7.1.3 キーボードで入力される他の代用文字
Other Keyboard Substitutions

マイナスとプライムといった文字が, より精密な数学の分野で最も一般的で重要なキーボードで入力される文字ですが, 他にも数々のときどき利用されるキーボードで入力される代用の文字があります. 例えば, 次のようなものがあります.

While the minus and prime characters are the most common and important keyboard characters with more precise mathematical counterparts, there are a number of other keyboard character substitutions that are sometime used. For example some may expect

<mo>''</mo>

これは, U+2033 [二重のプライム]として扱われます. 類似した代用の例として, U+2034 [三重のプライム]やU+2057 [四重のプライム]が挙げられるかもしれません. 同じように, U+007C [縦線]はときどきU+2223 [割り切れる]として使われます. MathMLはこれらをソフトウェア特有の慣習と見なしており, 入力用のソフトウェアに対して互換性を向上するために, より正確な数学用文字を使用した記述を生成するよう推奨しています.

to be treated as U+2033 [DOUBLE PRIME], and analogous substitutions could perhaps be made for U+2034 [TRIPLE PRIME] and U+2057 [QUADRUPLE PRIME]. Similarly, sometimes U+007C [VERTICAL LINE] is used for U+2223 [DIVIDES]. MathML regards these as application-specific authoring conventions, and recommends that authoring tools generate markup using the more precise mathematical characters for better interoperability.

7.7.2 疑似添え字
Pseudo-scripts

UCSには, 自然な添え字の外観を持っているように以前から扱われている数々の文字があります. これらの文字の視覚的表現は添え字に似ており, つまり基となる位置から上げて, 基となる文字の大きさより小さめに描かれます. 分の記号やプライム記号などです. 文章で使う場合, このような文字は識別子の後に続けて概ね同じフォントで描画されます. ここでは, これらの文字を疑似添え字と呼びます.

There are a number of characters in the UCS that traditionally have been taken to have a natural ‘script’ aspect. The visual presentation of these characters is similar to a script, that is, raised from the baseline, and smaller than the base font size. The degree symbol and prime characters are examples. For use in text, such characters occur in sequence with the identifier they follow, and are typically rendered using the same font. These characters are called pseudo-scripts here.

ほとんど全ての数学の文脈において, 疑似添え字は, MathMLで添え字としての明確な記述を使用している基となる式と連携されるべきです. 例えば, "xプライム"のより適切な表現は次のとおりです.

In almost all mathematical contexts, pseudo-script characters should be associated with a base expression using explicit script markup in MathML. For example, the preferred encoding of "x prime" is

<msup><mi>x</mi><mo>&#x2032;<!--PRIME--><!--PRIME--></mo></msup>

次のようではありません.

and not

<mi>x'</mi>

また, 他の明確な添え字の構造を用いない表現も適切ではありません. しかしながら, mtextのような文章の中では, 疑似添え字が他の文字データと一緒に利用されるかもしれないことに注意が必要です.

or any other variants not using an explicit script construct. Note, however, that within text contexts such as mtext, pseudo-scripts may be used in sequence with other character data.

数学の文脈において, 明確な記述が適切とされるのには2つの理由があります. まず, 疑似添え字を下付きの識別子と一緒に用いる場合の植字の問題があります. 以前から, x'の下付き添え字はプライムの下に描かれていました. 添え字の記述を用いることで簡単に達成されます. 例えば, 次のようにです.

There are two reasons why explicit markup is preferable in mathematical contexts. First, a problem arises with typesetting, when pseudo-scripts are used with subscripted identifiers. Traditionally, subscripting of x' would be rendered stacked under the prime. This is easily accomplished with script markup, for example:

<mrow><msubsup><mi>x</mi><mn>0</mn><mo>&#x2032;<!--PRIME--><!--PRIME--></mo></msubsup></mrow>

対照的に,

By contrast,

<mrow><msub><mi>x'</mi><mn>0</mn></msub></mrow>

この記述では, 添え字がずれた位置に描かれます.

will render with staggered scripts.

このことは, MathML表示ソフトウェアが, 上付き文字の位置に見つかった他の文字コードとは違うように疑似添え字を扱わなければならないことを意味してることに注意が必要です. ほとんどのフォントで, 疑似添え字に対する字形は最初から小さく, 基となる位置より上げられています.

Note this means that a renderer of MathML will have to treat pseudo-scripts differently from most other character codes it finds in a superscript position; in most fonts, the glyphs for pseudo-scripts are already shrunk and raised from the baseline.

文字を並列で書くより, 明確な添え字として記述することが適切である2番目の理由は, 意図している数学上の構造を一般により良く反映できるからです. 例えば, 次のようにです.

The second reason that explicit script markup is preferrable to juxtaposition of characters is that it generally better reflects the intended mathematical structure. For example,

<msup>
  <mrow><mo>(</mo><mrow><mi>f</mi><mo>+</mo><mi>g</mi></mrow><mo>)</mo></mrow>
  <mo>&#x2032;<!--PRIME--><!--PRIME--></mo>
</msup>

これは, ここのプライムが式全体に係ることを正確に反映しており, プライムが最後の右端のかっことしてふるまわないことを意味しています.

accurately reflects that the prime here is operating on an entire expression, and does not suggest that the prime is acting on the final right parenthesis.

しかしながら, MathMLの素子要素内のデータはユニコードの文字であるため, 次のような構文を含むMathMLの記述の有効性を否定することができません.

However, the data model for all MathML token elements is Unicode text, so one cannot rule out the possibility of valid MathML markup containing constructions such as

<mrow><mi>x'</mi></mrow>

または

and

<mrow><mi>x</mi><mo>'</mo></mrow>

最初の表記は, 状況によっては関数xの派生物として複数文字から成る識別子x'を表すために正当に利用されるかもしれません. しかし, このような表記は通常避けられるべきです. 入力したり, 有効性を確認したりするソフトウェアは, 推奨される添え字の記述を生成するよう促されます.

While the first form may, in some rare situations, legitmately be used to distinguish a multi-character identifer named x' from the derivative of a function x, such forms should generally be avoided. Authoring and validation tools are encouraged to generate the recommended script markup:

<mrow><msup><mi>x</mi><mo>&#x2032;<!--PRIME--><!--PRIME--></mo></msup></mrow>

U+2032 [プライム]文字は, おそらく最も一般的な疑似添え字ですが, 次の一覧に示すたくさんの他の疑似添え字があります.

The U+2032 [PRIME] character is perhaps the most common pseudo-script, but there are many others, as listed below:

疑似添え字 Pseudo-script Characters
U+0022	引用符 QUOTATION MARK
U+0027	アポストロフィ APOSTROPHE
U+002A	アスタリスク ASTERISK
U+0060	グレーブアクセント GRAVE ACCENT
U+00AA	女性序数標識 FEMININE ORDINAL INDICATOR
U+00B0	度記号 DEGREE SIGN
U+00B2	上付き2 SUPERSCRIPT TWO
U+00B3	上付き3 SUPERSCRIPT THREE
U+00B4	アキュートアクセント ACUTE ACCENT
U+00B9	上付き1 SUPERSCRIPT ONE
U+00BA	男性序数標識 MASCULINE ORDINAL INDICATOR
U+2018	左の一重引用符 LEFT SINGLE QUOTATION MARK
U+2019	右の一重引用符 RIGHT SINGLE QUOTATION MARK
U+201A	下側の一重引用符 SINGLE LOW-9 QUOTATION MARK
U+201B	高い反転した一重引用符 SINGLE HIGH-REVERSED-9 QUOTATION MARK
U+201C	左の二重引用符 LEFT DOUBLE QUOTATION MARK
U+201D	右の二重引用符 RIGHT DOUBLE QUOTATION MARK
U+201E	下側の二重引用符 DOUBLE LOW-9 QUOTATION MARK
U+201F	高い反転した二重引用符 DOUBLE HIGH-REVERSED-9 QUOTATION MARK
U+2032	プライム PRIME
U+2033	二重のプライム DOUBLE PRIME
U+2034	三重のプライム TRIPLE PRIME
U+2035	反転したプライム REVERSED PRIME
U+2036	反転した二重のプライム REVERSED DOUBLE PRIME
U+2037	反転した三重のプライム REVERSED TRIPLE PRIME
U+2057	四重のプライム QUADRUPLE PRIME

加えて, (U+2070で始まる)ユニコードの上付き文字と下付き文字の範囲にある文字は, 数式の中に現れた場合, 疑似添え字として扱われるべきです.

In addition, the characters in the Unicode Superscript and Subscript block (beginning at U+2070) should be treated as pseudo-scripts when they appear in mathematical formulas.

これらの文字の中には, U+002A [アスタリスク], U+00B0 [度記号], U+2033 [二重のプライム], バックプライムとしては知られるU+2035 [反転したプライム]といったキーボードでありふれたものもあることに注意が必要です.

Note that several of these characters are common on keyboards, including U+002A [ASTERISK], U+00B0 [DEGREE SIGN], U+2033 [DOUBLE PRIME], and U+2035 [REVERSED PRIME] also known as a back prime.

7.7.3 合成文字
Combining Characters

UCSには, 数々の異なる自然言語のアクセントとして使われる合成文字がたくさんあります. これらのいくつかは, 数学のアクセントとして必要な記述を提供するかのように見えるかもしれません. これらは数学の記述では使用すべきではありません. 上付き添え字, 下付き添え字, 下側添え字, 上側添え字といった上記でたった今論じた構成を, 数学の記述では使うべきです. もちろん, 合成文字が複数文字の識別子として必要だった場合や通常の文章の中では使用して構いません.

In the UCS there are many combining characters that are intended to be used for the many accents of numerous different natural languages. Some of them may seem to provide markup needed for mathematical accents. They should not be used in mathematical markup. Superscript, subscript, underscript, and overscript constructions as just discussed above should be used for this purpose. Of course, combining characters may be used in multi-character identifiers as they are needed, or in text contexts.

数学の記述に合成文字が自然に出てくる場合がもう1つあります. いくつかの文字は, U+003E [大なり記号]の打ち消しにあたるU+226F [大なりでない]といった具合に打ち消しに関連付けられています. U+226F [大なりでない]の字形は通常, U+003E [大なり記号]とそれを貫く斜線です. そのため, 合成する斜線U+0338 [文字に合成する長い斜線]を使って作られたU+003E-0338を用いて表現することもできます. これは, その文字固有のユニコードコードポイントを持っている, よく数学で使われる他の25の文字にも当てはまります. 一方で, [実体]で一覧にされたU+0338 [文字に合成する長い斜線]を用いて表現される31の文字実体もあります.

There is one more case where combining characters turn up naturally in mathematical markup. Some relations have associated negations, such as U+226F [NOT GREATER-THAN] for the negation of U+003E [GREATER-THAN SIGN]. The glyph for U+226F [NOT GREATER-THAN] is usually just that for U+003E [GREATER-THAN SIGN] with a slash through it. Thus it could also be expressed by U+003E-0338 making use of the combining slash U+0338 [COMBINING LONG SOLIDUS OVERLAY]. That is true of 25 other characters in common enough mathematical use to merit their own Unicode code points. In the other direction there are 31 character entity names listed in [Entities] which are to be expressed using U+0338 [COMBINING LONG SOLIDUS OVERLAY].

同じように, 縦線を重ねることによって打ち消しを表す文字U+20D2 [文字に合成する長い縦線]があります. いくつかは合成済みの形で利用でき, いくつかは名前付き文字実体が合成文字に明確に与えられています. 加えて, U+0333 [文字に合成する二重下線]とU+20E5 [文字に合成する長い逆の斜線]を利用した例があり, U+FE00 [異体字選択用文字1]の利用により指定された異体字もあります. これらの文字の一覧は, [実体]の一覧で見ることができます.

In a similar way there are mathematical characters which have negations given by a vertical bar overlay U+20D2 [COMBINING LONG VERTICAL LINE OVERLAY]. Some are available in pre-composed forms, and some named character entities are given explicitly as combinations. In addition there are examples using U+0333 [COMBINING DOUBLE LOW LINE] and U+20E5 [COMBINING REVERSE SOLIDUS OVERLAY], and variants specified by use of the U+FE00 [VARIATION SELECTOR-1]. For fuller listing of these cases see the listings in [Entities].

一般的な決まりとして, 合成する文字の前に来る基の文字は, あたかもその合成された文字が存在するかのように合成済みの文字として扱われるべきです.

The general rule is that a base character followed by a string of combining characters should be treated just as though it were the pre-composed character that results from the combination, if such a character exists.

7 文字, 実体, 書式Characters, Entities and Fonts

7.1 導入Introduction

7.2 ユニコード文字データUnicode Character Data

7.3 実体宣言Entity Declarations

7.4 ユニコードに含まれない特殊文字Special Characters Not in Unicode

7.5 数学用英数字記号Mathematical Alphanumeric Symbols

7.6 表示されない文字Non-Marking Characters

7.7 変則的な数学用文字Anomalous Mathematical Characters

7.7.1 キーボードで入力される文字Keyboard Characters

7.7.1.1 マイナスMinus

7.7.1.2 アポストロフィ, 引用符, プライムApostrophes, Quotes and Primes

7.7.1.3 キーボードで入力される他の代用文字Other Keyboard Substitutions

7.7.2 疑似添え字Pseudo-scripts

7.7.3 合成文字Combining Characters

7 文字, 実体, 書式
Characters, Entities and Fonts