文字に対するXML実体の定義(XML Entity Definitions for Characters 日本語訳)

1 導入
Introduction

表記方法と記号は, 特に科学文書において, 人間のコミュニケーションに大変重要であると判明してきています. 数学は, その表記方法が, 簡潔で暗示的であるよう, 絶え間なく変わってきたために, 部分部分成長してきました. 数学の表記の中で利用するために開発された多くの新しい記号があり, 数学者は, もともと他のところで導入された多くの記号を利用することを自制してきませんでした. 結果として, 科学では一般に, とりわけ数学においては, とても巨大な記号の集合を使うことになりました. それらの記号を使うことができなければ, すらすらと科学を記述することは困難です. 記号に対応する字体が, 特定の表示装置で表現できなければ, 科学的な内容を読むことは困難です. 大半の場合, 文字を直接, ユニコード文字のデータとして, または数値によるXML文字参照として, 格納することが望ましいです.

Notation and symbols have proved very important for human communication, especially in scientific documents. Mathematics has grown in part because its notation continually changes toward being succinct and suggestive. There have been many new signs developed for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally introduced elsewhere. The result is that science in general, and particularly mathematics, makes use of a very large collection of symbols. It is difficult to write science fluently if these characters are not available for use. It is difficult to read science if corresponding glyphs are not available for presentation on specific display devices. In the majority of cases it is preferable to store characters directly as Unicode character data or as XML numeric character references.

しかしながら, 環境によっては, XML実体参照として用意されたASCII文字による入力方法を利用することの方が, より便利です. 多くの実体名は広く一般に使われており, この仕様書は, ユニコードとそれら各々の実体名とを結び付ける標準の方法を用意することを目指しています. この仕様書は, 以前の仕様書で既に使われてきたもの以外の実体名は, 一切導入しません. これらの名前は, XML実体参照のような入力方法のために設計された, 覚えやすい短い名前で, ユニコード標準の一部を形づくっている長めの正式な名前ではないことに注意して下さい.

However, in some environments it is more convenient to use the ASCII input mechanism provided by XML entity references. Many entity names are in common use, and this specification aims to provide standard mappings to Unicode for each of these names. It introduces no names that have not already been used in earlier specifications. Note that these names are short mnemonic names designed for input methods such as XML entity references, not the longer formal names that form part of the Unicode standard.

具体的には, "iso"という文字で始まる集合の実体名は, 最初にSGML ([SGML])の中で標準化され, [ISO9573-13-1991]の中で更新されています. W3C数学作業部会は, 大本の標準化委員会(ISO/IECJTC1 SC34)から, それらの集合の維持と開発を引き継ぐように促されてきました. "mml"で始まる集合は, 最初にMathML [MathML2]で標準化され, "xhtml"で始まるものはHTML [HTML4]で標準化されました.

Specifically, the entity names in the sets starting with the letters "iso" were first standardized in SGML ([SGML]) and updated in [ISO9573-13-1991]. The W3C Math Working Group has been invited to take over the maintenance and development of these sets by the original standards committee (ISO/IECJTC1 SC34). The sets with names starting "mml" were first standardized in MathML [MathML2] and those starting with "xhtml" were first standardized in HTML [HTML4].

この文書は, ウェブで実体名を用いてきた長年の集大成です. HTMLでは, 特別な文字に対して使われる名付けられた実体が少しだけありましたが, 数学記号とともに新しい名前の氾濫が起こりました. そして, この文書は, MathML 2.0 [MathML2]勧告の第6章の拡張や最終の改正と見なされることになります. 今, この文書は, XMLの世界やユニコードのすみずみに至る, 文字実体参照の既存の利用方法を調和させた完全な一覧を示します.

This document is the result of years of employing entity names on the Web. There were always a few named entities used for special characters in HTML, but a flood of new names came with the symbols of mathematics. This means that this document can be viewed as an extension and final revision of Chapter 6 of the MathML 2.0 [MathML2] recommendation. Now it presents a completed listing harmonizing the known uses of character entity names throughout the XML world and Unicode.

非常に多く文字実体参照があり, それらを細かく示したファイルは, 頻繁な探索を受けるデータ資源となるので, カタログファイルが提供されています. 長い時間を経て, 仕様書とともに提供された実体名の一覧を変更する必要が生じることが予期されない以上, 利用者は, 関係のある実体名の一時的に抜き出した一覧を, ローカルな環境にキャッシュするよう強く促されています.

Since there are so many character entity names, and the files specifying them are resources that may be subject to frequent lookup, a template catalog file has also been provided. Users are strongly encouraged to design their implementations so that relevant entity name tables are cached locally, since it is not expected that the listings provided with this specification will need changing for some long time.

2 実体名の集合
Sets of names

この仕様書は, 以前の仕様書で定義された, たくさんの実体名の集合の, ユニコードへの割り当て方を定義します.

This specification defines mappings to Unicode of many sets of names that have been defined by earlier specifications.

まず始めに, 全ての集合を結合して一覧にした2つの表を示します. 最初がユニコード順で, 次がアルファベット順です.

We first present two tables listing all the sets combined, first in Unicode order and then in alphabetic order:

ユニコード順の全一覧表
All in Unicode order
アルファベット順の全一覧表
All in alphabetic order.

次に, それぞれの実体の集合を記した表を示します. それぞれの集合の表は, 対応する実体の集合のDTD実体定義とリンクしており, また, 実体名をユニコードのコードポイントから逆引きするために実装されたXSLT2スタイルシート(もちろん, このスタイルシートは, 1つの実体名に対して1つのユニコードのコードポイントを結び付けることのみ可能)ともリンクしています.

Then there come tables documenting each of the entity sets. Each set has a link to the DTD entity declaration for the corresponding entity set, and also a link to an XSLT2 stylesheet that will implement a reverse mapping from characters to entity names (this is, of course, only possible for entity names that map to a single Unicode code point).

isobox 罫線
Box and Line Drawing
isocyr1 ロシアのキリル文字
Russian Cyrillic
isocyr2 ロシア以外のキリル文字
Non-Russian Cyrillic
isodia 発音記号
Diacritical Marks
isolat1 ラテン文字拡張1
Added Latin 1
isolat2 ラテン文字拡張2
Added Latin 2
isonum 数学記号と特殊記号
Numeric and Special Graphic
isopub 出版
Publishing
isoamsa 数学記号拡張:矢印記号
Added Math Symbols: Arrow Relations
isoamsb 数学記号拡張:論理演算記号
Added Math Symbols: Binary Operators
isoamsc 数学記号拡張:データの区切り記号
Added Math Symbols: Delimiters
isoamsn 数学記号拡張:打ち消された不等号
Added Math Symbols: Negated Relations
isoamso 数学記号拡張:通常の記号
Added Math Symbols: Ordinary
isoamsr 数学記号拡張:不等号
Added Math Symbols: Relations
isogrk1 ギリシア文字
Greek Letters
isogrk2 修飾されたギリシア文字
Monotoniko Greek
isogrk3 ギリシア文字記号
Greek Symbols
isogrk4 代用のギリシア文字記号
Alternative Greek Symbols
isomfrk 数学用アルファベット:フラクタル
Math Alphabets: Fraktur
isomopf 数学用アルファベット:オープンフェイス
Math Alphabets: Open Face
isomscr 数学用アルファベット:スクリプト
Math Alphabets: Script
isotech 一般技術記号
General Technical
mmlextra 追加のMathML記号
Additional MathML Symbols
mmlalias MathML別名
MathML Aliases
xhtml1-lat1 HTMLラテン文字
Latin for HTML
xhtml1-special HTML特殊文字
Special for HTML
xhtml1-symbol HTML記号
Symbol for HTML
html5-uppercase HTML別名(大文字)
uppercase aliases for HTML
predefined XML定義済実体
Predefined XML

個々の実体の集合それぞれに対応するスタイルシートや実体定義ファイル(訳注:DTD形式で実体を定義したファイル, 一般に拡張子は"ent")に加えて, 2つの結合された実体の集合と同じように, それぞれ2つのファイル形式で, 結合されたスタイルシートが提供されています. まず, 個々の実体の集合を参照している小さめのファイルが, 次に, 個々の実体の定義を直接含んでいる, 実体名順に定義が並べられ, 重複が取り除かれた, 大きめのファイルが提供されています. 最初の集合w3centitiesは, この仕様書で定義されている全ての実体の集合を含んでいます. 2番目のhtmlmathmlは, HTMLやMathMLに含まれる実体の集合から構成される若干小さめの集合です. これらは, ウェブの環境の中で利用する場合, おそらく(キャッシュされたDTDまたは実体処理システムの構築によって)正しく動作する必要のある実体です.

In addition to the stylesheets and entity files corresponding to each individual entity set, a combined stylesheet is provided, as well as two combined entity sets, each in two formats. First as a small file that references the individual entity sets listed above, and then as a larger file that directly contains a definition of each entity, with the definitions sorted in order of entity names, with duplicates removed. The first set, w3centities includes all the entity sets defined in this specification. the second, htmlmathml, is a slightly smaller set just comprising the entity sets included in HTML and MathML. These are the entities most likely to work (due to cached DTD or built in entity support) if used in a web context.

w3centities W3C実体集合, 上記の全ての実体の集合を参照しています.
W3C entities collection; referencing all entity sets listed above
w3centities-f 実体定義と同様の集合, 重複を取り除いて, 単一のファイルにまとめてあります.
the same set of entity definitions, expanded into a single file, with duplicates removed
htmlmathml htmlmathml集合, HTMLまたはMathMLで使われる実体の集合を参照しています.
htmlmathml collection; referencing the entity sets used in HTML or MathML
htmlmathml-f HTMLとMathMLの実体定義をまとめた集合.
the expanded set of HTML and MathML entity definitions

3 科学文書におけるユニコード文字の範囲
Unicode Character Ranges for Scientific Documents

ある種の文字は, 科学文書を作成することと特定の関連があります. 下記の表は, 数学で最もよく利用される文字を含むユニコードの範囲を示しています.

Certain characters are of particular relevance to scientific document production. The following tables display Unicode ranges containing the characters that are most used in mathematics.

この節でリンクしている表はそれぞれ, 256の画像を含んでおり, 画像がローカルの環境にキャッシュされていない場合, 読み込みに時間がかかるかも知れないことに注意して下さい.

Note that each of the tables linked from this section contains 256 images and may take a while to load if the images have not been cached locally.

000 C0制御文字と基本ラテン文字, C1制御文字と追加ラテン1
C0 Controls and Basic Latin, C1 Controls and Latin-1 Supplement
001 ラテン文字拡張A, ラテン文字拡張B
Latin Extended-A, Latin Extended-B
002 IPA拡張, 前進を伴う修飾文字
IPA Extensions, Spacing Modifier Letters
003 合成用発音記号, ギリシア文字とコプト語の文字
Combining Diacritical Marks, Greek and Coptic
004 キリル文字
Cyrillic
020 一般的な句読点, 上付き文字, 下付き文字, 通貨記号, 合成用記号用発音記号
General Punctuation, Superscripts and Subscripts, Currency Symbols, Combining Diacritical Marks for Symbols
021 文字の様な記号, 数字に準じるもの, 矢印
Letterlike Symbols, Number Forms, Arrows
022 数学演算子
Mathematical Operators
023 その他技術記号
Miscellaneous Technical
024 制御用文字記号, 光学文字認識, 囲み文字
Control Pictures, Optical Character Recognition, Enclosed Alphanumerics
025 罫線記号, ブロック要素, 幾何学的図形
Box Drawing, Block Elements, Geometric Shapes
026 その他記号
Miscellaneous Symbols
027 飾り文字, その他数学記号A, 追加矢印A
Dingbats, Miscellaneous Mathematical Symbols-A, Supplemental Arrows-A
029 追加矢印B, その他数学記号B
Supplemental Arrows-B, Miscellaneous Mathematical Symbols-B
02A 追加数学演算子
Supplemental Arrows-B, Miscellaneous Mathematical Symbols-B
02B その他記号/矢印
Miscellaneous Symbols and Arrows
0FB アルファベット表示形, アラビア文字表示形A
Alphabetic Presentation Forms, Arabic Presentation Forms-A
0FE 異体字選択用文字, 縦表示形, 合成用半記号, CJK互換形, 小字形, アラビア文字表示形B
Variation Selectors, Vertical Forms, Combining Half Marks, CJK Compatibility Forms, Small Form Variants, Arabic Presentation Forms-B
1D4 数学用英数字記号
Mathematical Alphanumeric Symbols
1D5 数学用英数字記号(つづき)
Mathematical Alphanumeric Symbols (continued)
1D6 数学用英数字記号(つづき)
Mathematical Alphanumeric Symbols (continued)
1D7 数学用英数字記号(つづき)
Mathematical Alphanumeric Symbols (continued)

4 数学用英数字記号
Mathematical Alphanumeric Characters

この仕様書により定義されている実体の多くは, ユニコード第0面の文字のような記号やユニコード第1面の数学用英数字記号に含まれている記号に関連しています. 下記の表は, 全てのこれらの記号を一覧にしたもので, 強調文字は第1面にあるものではなく, また, 与えられた実体名は適切なものです.

Many of the entities defined by this specification relate to the mathematical alphanumeric characters contained in the letter-like symbols block of Unicode Plane 0, or in the Mathematical Alphanumeric Symbols block in Unicode Plane 1. The following tables list all these symbols, highlighting those that are not in Plane 1, and giving entity names where appropriate.

太字(明朝体)
Bold (Serif)
斜体
Italic or Slanted
太字の斜体
Bold Italic or Slanted
二重線(オープンフェイス, 黒板における太字)
Double Struck (Open Face, Blackboard Bold)
スクリプト(または, 筆体)
Script (or Calligraphic)
太字のスクリプト
Bold Script
フラクタル
Fraktur
太字のフラクタル
Bold Fraktur
ゴシック体
Sans Serif
太字のゴシック体
Bold Sans Serif
斜体のゴシック体
Slanted Sans Serif
斜体で太字のゴシック体
Slanted Bold Sans Serif
等幅フォント
Monospace

5 打ち消された文字や異体字に対する実体
Entities for Negated and Variant Characters

仕様書の大部分における, それぞれの実体の定義は, 単一のユニコード文字に拡張されたものです. 2文字以上の文字列について拡張された定義にについて, この節は述べています.

Each of the entity definitions in a majority of the specification expands to a single Unicode character. The definitions that expand to a sequence of two or more characters are outlined in this section.

5.1 打ち消された数学用文字
Negated Mathematical Characters

これまでに一覧にしたユニコード文字に加えて, 打ち消された形や抹消された形の文字を提供するために, 文字U+0338 (/), U+20D2 (|), U+20E5 (\)を合成して使おうという人がいるかもしれません. 合成する文字は, "基となる"文字のすぐ後に, 間に記述記号や空白を置かずに, 置かれるべきです. 強調文字を合成する場合も同様です.

In addition to the Unicode Characters so far listed, one may use the combining characters U+0338 (/), U+20D2 (|) and U+20E5 (\) to produce negated or canceled forms of characters. A combining character should be placed immediately after its "base" character, with no intervening markup or space, just as is the case for combining accents.

原則として, 任意のユニコード文字に打ち消しの文字を適用できるかもしれませんが, 数学用にデザインされたフォントは, 典型的に合成済の打ち消された字形を持っているものもあります. そのような場合, MathML表示ソフトウェアは, これらの合成済の字形を表示するべきです. 合成した文字の文字コードは, U+2260に相当するU+003D U+0338のように, 既に存在するUCS文字として表すことも, U+2202 U+0338のように表せないこともあります. 後者の打ち消され文字の一般的な例で, 既に特定されているものについて, 次の表に一覧にしてあります.

In principle, the negation characters may be applied to any Unicode character, although fonts designed for mathematics typically have some negated glyphs ready composed. A MathML renderer should be able to use these pre-composed glyphs in these cases. A compound character code either represents a UCS character that is already available, as in the case of U+003D U+0338 which amounts to U+2260, or it does not, as is the case for U+2202 U+0338. The common cases of negations, of the latter type, that have been identified are listed in the tables.

文字に合成する長い斜線
combining long solidus overlay
文字に合成する長い縦線
combining long vertical line overlay
文字に合成する長い逆の斜線
combining reverse solidus overlay

単一の文字が, 文字を合成することによって表せるものについて既に定義されているなら, 単一の文字を, ばらばらの文字による表現の代わりに用いるべきというのが, W3Cとユニコードの方針であることに注意して下さい. また, 既に存在する合成文字として表されるものについて, 新しい単一の文字が取り入れられることはないということです. このことについて, さらに詳しい情報は, Unicode Standard Annex 15, Unicode Normalization Forms [Unicode15](訳注:"ユニコード標準付録15 ユニコード正規化形式"という意味), 特にNormalization Form C(訳注:"正規化形式C"という意味)の議論を見て下さい.

Note that it is the policy of the W3C and of Unicode that if a single character is already defined for what can be achieved with a combining character, that character must be used instead of the decomposed form. It is also intended that no new single characters representing what can be done by with existing compositions will be introduced. For further information on these matters see the Unicode Standard Annex 15, Unicode Normalization Forms [Unicode15], especially the discussion of Normalization Form C.

5.2 数学用異体字
Variant Mathematical Characters

ユニコードは, 単純なフォントの異体字に, 文字コードを割り振ることを避けようと試みています. コードポイントが割り当てられる際に, 記録される字形が異なる場合は, 微妙な意味の違い以上のものがあるべきです. 価値のない異体字を記録するため, ユニコード3.2には, 後置修飾詞としてふるまう特別な文字U+FE00 (異体字選択用文字1)があります. しかしながら, 正式に認められている異体字選択用文字との組み合わせは, ユニコードの一部として記録されている一覧表のものに制限されています. 異体字選択用文字1は, そこに一覧にされている文字にのみ適用されます. 出来上がった組み合わせ文字は, ユニコードでは, 別々の文字とは見なされず, 元の文字の異体字となります. ユニコードに対応したシステムは, 利用可能なフォントが異体字の字形を提供していない場合, 組み合わせ文字を元の文字として描くかも知れません.

Unicode attempts to avoid having several character codes for simple font variants. For a code point to be assigned there should be more than a nuance in glyphs to be recorded. To record variants worth noting there is a special character in Unicode 3.2, U+FE00 (VARIATION SELECTOR-1), which acts as a postfix modifier. However the legally allowed combinations with this variation selector are restricted to a list recorded as part of Unicode. The VARIATION SELECTOR-1 character may only be applied to the characters listed here. The resulting combination is not regarded by Unicode as a separate character, but a variation on the base character. Unicode aware systems may render the combination as the base if the available fonts do not support the variant glyph shape.

異体字選択用文字1
variation selector-1

A 特別に考慮すべき点
Special Considerations

A.1 イプシロン
Epsilon

歴史上, 小文字のイプシロンの異体字の形には, たくさんの混乱や同意不足がありました.

Historically there has been much confusion and lack of agreement over variant forms for lower case epsilon.

この仕様書は, 下記の定義を用いています. 実体名epsilonは, 原文のギリシア語で用いる文字として使われ(U+03B5), varepsilonは, 数学でより共通に用いるイプシロン記号の文字として使われる(U+03F5)ことに注意して下さい. また, この使用方法は, 似た文字の組(例えは, thetaとvartheta)の名前のつけ方についても同様ですが, TeXやMathML2やいくつかの以前からあるISO実体の集合のユニコードへの割り当てで使われる名付け方の慣習とは互換性がないことに注意して下さい.

This specification uses the definitions below. Note that the name epsilon is used for the character used in textual Greek (U+03B5) and varepsilon used for the epsilon symbol character more commonly used in mathematics (U+03F5). Note that this usage is compatible with the naming of similar pairs of characters (for example theta, vartheta) but incompatible with the naming convention used in TeX, MathML2 and some earlier mappings of the ISO entity sets to Unicode.

実体名 Entity	集合名 Set	説明 Description	ユニコード文字 Unicode Character
eacgr	isogrk2	=小文字イプシロン, アクセント, ギリシア語 =small epsilon, accent, Greek	U+03AD	ギリシア文字のアクセント記号付き小文字イプシロン GREEK SMALL LETTER EPSILON WITH TONOS
egr	isogrk1	=小文字イプシロン, ギリシア語 =small epsilon, Greek	U+03B5	ギリシア文字の小文字イプシロン GREEK SMALL LETTER EPSILON
epsi	isogrk3	/epsilon /epsilon
epsilon	xhtml1-symbol
epsiv	isogrk3	/straightepsilon, 小文字イプシロン, ギリシア語 /straightepsilon, small epsilon, Greek	U+03F5	ギリシア文字の三日月状のイプシロン記号 GREEK LUNATE EPSILON SYMBOL
straightepsilon	mmlalias	ISOGRK3 epsivの別名 alias ISOGRK3 epsiv
varepsilon	mmlalias	ISOGRK3 epsivの別名 alias ISOGRK3 epsiv
bepsi	isoamsr	/backepsilon R: 例のようなもの /backepsilon R: such that	U+03F6	ギリシア文字の反転した三日月状のイプシロン記号 GREEK REVERSED LUNATE EPSILON SYMBOL
backepsilon	mmlalias	ISOAMSR bepsiの別名 alias ISOAMSR bepsi	U+03F6
b.epsi	isogrk4	小文字イプシロン, ギリシア語 small epsilon, Greek	U+1D6C6	数学用太字の小文字イプシロン MATHEMATICAL BOLD SMALL EPSILON
b.epsiv	isogrk4	イプシロンの異体字 variant epsilon	U+1D6DC	数学用太字のイプシロン記号 MATHEMATICAL BOLD EPSILON SYMBOL

A.2 ファイ
Phi

ファイの状況は, イプシロンととても似ていますが, ユニコードの以前のバージョンが, U+03C6とU+03D5に対する字形の例を, 正当な利用方法に対し取り違えていたという, より複雑な事情があります. また, 古いフォントの中には, いまだに古い慣習に従って使われているものがあります. この仕様書で使われている定義は, 下記の一覧の通りです.

The situation for phi is very similar to that of epsilon, although with the further complication that early versions of Unicode had the sample glyphs for U+03C6 and U+03D5 swapped from the current usage, and some older fonts still in use follow that older convention. The definitions used in this specification are as listed below.

実体名 Entity	集合名 Set	説明 Description	ユニコード文字 Unicode Character
phi	isogrk3	/phi - 小文字ファイ, ギリシア語 /phi - small phi, Greek	U+03C6	ギリシア文字の小文字ファイ GREEK SMALL LETTER PHI
phi	xhtml1-symbol	ギリシア文字の小文字ファイ greek small letter phi
phgr	isogrk1	=小文字ファイ, ギリシア語 =small phi, Greek
straightphi	mmlalias	ISOGRK3 phivの別名 alias ISOGRK3 phiv	U+03D5	ギリシア文字のファイ記号 GREEK PHI SYMBOL
phiv	isogrk3	/varphi - 直立のファイ /varphi - straight phi
varphi	mmlalias	ISOGRK3 phivの別名 alias ISOGRK3 phiv
b.phi	isogrk4	小文字ファイ, ギリシア語 small phi, Greek	U+1D6D7	数学用太字の小文字ファイ MATHEMATICAL BOLD SMALL PHI
b.phiv	isogrk4	ファイの異体字 variant phi	U+1D6DF	数学用太字のファイ記号 MATHEMATICAL BOLD PHI SYMBOL

A.3 複数文字の実体
Multiple Character Entities

前節の一覧で示した合成や, 異体字の組み合わせに加えて, 下表は, 1文字より長い文字列を置きかえる残りの実体を一覧にしたものです.

In addition to the combining and variant character combinations listed in the previous sections, the following table lists the remaining entity replacement texts that consist of more than one character.

実体名 Entity	集合名 Set	説明 Description	ユニコード文字 Unicode Character
fjlig	isopub	小文字fj連字 small fj ligature	U+0066 U+006A	(fj連字) fj ligature
ThickSpace	mmlextra	5/18em幅の空白 space of width 5/18 em	U+205F U+200A	(5/18em幅の空白) space of width 5/18 em
race	isoamsb	反転した相似, 下線 reverse most positive, line below	U+223D U+0331	(下線付きの)反転したチルダ REVERSED TILDE with underline
acE	isoamsb	相似, 二重下線 most positive, two lines below	U+223E U+0333	(二重下線付きの)ひっくり返ったゆったりしたS INVERTED LAZY S with double underline
DownBreve	mmlextra	(空白を伴わない)ひっくり返った短音記号 breve, inverted (non-spacing)	U+0020 U+0311	合成用のひっくり返った短音記号 COMBINING INVERTED BREVE
tdot	isotech	文字の上の3つの点 three dots above	U+0020 U+20DB	文字の上に合成する3つの点 COMBINING THREE DOTS ABOVE
TripleDot	mmlalias	ISOTECH tdotの別名 alias ISOTECH tdot	U+0020 U+20DB	文字の上に合成する3つの点 COMBINING THREE DOTS ABOVE
DotDot	isotech	文字の上の4つの点 four dots above	U+0020 U+20DC	文字の上に合成する4つの点 COMBINING FOUR DOTS ABOVE

ユニコードには, アルファベット表示系の区画に含まれるfi(U+FB01)のような, 一般のfの連字はあるにもかかわらず, fjという文字はありません. fjlig実体は, "fj"という文字の組に当てはめられ, 現在の文字入力装置は, fjという連字をフォントが提供しているなら, fjという組み合わせに対し, 自動的にその連字を用いるべきです.

Unicode does not have an fj character, although the other common f ligatures such as fi (U+FB01) are contained in the Alphabetic Presentation Forms block. The fjlig entity is mapped to the pair of characters "fj"; modern typesetting engines should automatically use the fj ligature for this combination if the font supplies such a ligature.

ユニコードは, (5/18emを除く, 6/18以下の全ての1/18の倍数の幅の)空白文字を持っており, そのため, ThickSpace実体は, 空白文字の組に割り当てられています. U+2005(1/4em)が使われていたこともありましたが, 1/4emは5/18emと等しくないため, 表示されるフォントのどの大きさでも, その差は目に見えて分かるものではないとはいえ, 上記の定義が選ばれました.

Unicode has a range of space characters (including all multiples of 1/18 em up to 6/18, except for 5/18 em) thus the ThickSpace entity is mapped to a pair of space characters. An alternative would have been to use U+2005 (1/4 em), but 1/4 em is not equal to 5/18 em, so the above definition was chosen, despite the fact that the difference is unlikely to be visibly noticeable at most typeset font sizes.

実体raceとacEは, ユニコードがコードポイントを持っていない文字に対する下線を引いたもので, そのため, 文字に下線を合成することは, 打ち消された演算子のために一画加えるのと類似した方法で行われています.

The entities race and acE denote underlined characters for which Unicode does not have codepoints, thus combining underline characters have been used, in a way analogous to the use of combining strokes for negated operators.

[Charmod-norm]でより詳しく説明していますが, 実体の展開とユニコード標準化が行われる順番によって異なった結果が提供される可能性があるので, 実体の文字列を合成文字で置き換えることは勧められません. この仕様書は, 可能な限り合成でない文字を使うようにしていますが, tdotやTripleDotやDotDotの場合, ユニコードには, アクセントを合成する形しかなく, そのため, 展開した実体が, その前の文字列と合成されるのを避けるため, 実体を置き換えた文字列は空白で始まります.

For reasons explained further in [Charmod-norm], it is not advisable to to start the replacement text of an entity with a combining character, as then potentially different results may be produced depending on the order in which entity expansion and Unicode normalisation are performed. As far as possible this specification uses non-combining characters, however, in the cases tdot, TripleDot and DotDot Unicode only has combining forms of the accents, and so the entity replacement text starts with a space, to avoid the possibility that the expansion of the entity combines with preceding text.

B 変更点
Changes

B.1 2010年2月11日以降の変更点
Changes since 2010-02-11

例の画像がいくつか改良され, ユニコードが参照している画像に, より一致したものが用いられるようになりました.

Several example images improved, bringing them more in line with the Unicode reference images.

B.2 2009年11月17日から2010年2月11日の間の変更点
Changes between 2010-02-11 and 2009-11-17

ユニコードを内部IDを用いたU01234という形で表示する代わりに, 一貫してU+1234という記述法を使ったこと等, 様々な編集の改良が行われました.

Various editorial improvments, including using Unicode U+1234 notation more consistently rather than displaying the internal IDs of the form U01234.

2009年11月17日のドラフト版で配布された結合された実体ファイルは, 2つの実体名が事例によってのみ異なっているときに, 片方のものしか含んでいないという間違いがありました. 最新版は修正されています.

The combined entities file distributed with the 2009-11-17 draft introduced an error that if two entity names differed only by case, only one was included. This has been corrected.

HTMLやMathMLで利用可能な実体に対応する, 結合された実体の集合htmlmathmlが, きちんと提供されるようになりました. XMLで定義済の実体に対応する(以前内部的に使われていた)定義済の集合が, 文書化されました.

The combined entity set htmlmathml corresponding to the entities usable in HTML and MathML is now explicitly provided. The predefined set, corresponding to the entities predefined in XML is now documented (it was previously used internally).

実体xveeとxwedgeは, あるユニコード(U+22C1とU+22C0)に割り当てられていましたが, 実体の説明が取り違っていました. xveeは論理和で, xwedgeは論理積です. この間違いは, [ISO9573-13-1991]で1999年に報告されましたが, 提案された技術的な正誤表は, 直されていませんでした. 実体ファイルは, この変更の影響を受けません.

The entities xvee and xwedge had the correct Unicode assignments (U+22C1 and U+22C0) but the entity descriptions have been swapped, xvee is logical or and xwedge is logical and. This error in [ISO9573-13-1991] was reported in 1999, Proposed Technical Corrigendum but not previously fixed. The entity files are unaffected by this change.

実体NotGreaterFullEqualが, 間違って打ち消された小なり記号(U+2266 U+0338)に割り当てられており, うち消された大なり記号(U+2267 U+0338)に訂正されました.

The entity NotGreaterFullEqual which had been erroneously assigned to a negated less than operator (U+2266 U+0338) has been corrected to be the negated greater than operator (U+2267 U+0338).

カタログファイルの例catalogが, 実体ファイルへの参照をW3Cのサーバーへから, ローカルな環境のコピーへに変えるために, 提供されるようになりました.

A sample catalog is now provided to redirect references to the entity files to copies on the local machine rather than the W3C server.

B.3 2008年7月21日から2009年11月17日の間の変更点
Changes between 2009-11-17 and 2008-07-21

html5-uppercase集合が文書化されました

The html5-uppercase set is now documented.

正規化形式Cに合わせるため, 実体ohmとangstをU+03A9とU+00C5に変更しました. W3Cバグジラの記録を見て下さい.

The entities ohm and angst have changed to U+03A9 and U+00C5 to match NFC. See w3c bugzilla entry.

誤ってU+29DAが割り当てられていた実体raceに, U+2233D U+0331の組が割り当てられるようになりました. (U+223Dは, 大本のISO文書では, 回転させたチルダでなく回転したSの形をしているわけでは全くありませんが, ユニコード5.2では, 大変似た文字として現れます.)

The entity race, which had been erroneously assigned U+29DA, is now assigned the combination U+223D U+0331. (U+223D isn't quite the shape shown in the original ISO document which is a rotated S rather than a rotated tilde, but this appears to be the closest character in Unicode 5.2.)

実体bsolhsubとsuphsolは, 以前は2つの文字の合成U+005C U+2282とU+2283 U+002Fに割り当てられていましたが, これらの実体に対応するため, 明確に文字コードが加えられたユニコード5の, U+27C8とU+27C9に割り当てられるようになりました.

The entities bsolhsub and suphsol which were previously mapped to two-character combinations U+005C U+2282 and U+2283 U+002F are now mapped to the Unicode 5 characters that were added specifically to support these entities, U+27C8 and U+27C9.

ユニコード5.2に対応するようにソースファイルを全て更新しました.

The source files have all been updated to match Unicode 5.2.

実体ThickSpaceが, 3文字の組み合わせU+2009 U+200A U+200Aでなく, 2文字の組み合わせU+205F U+200Aに割り当てられるようになりました. つまり, (3/18 + 1/18 + 1/18)emではなく, (4/18 + 1/18)emに割り当てられるようになりました.

The entity ThickSpace now maps to the pair U+205F U+200A rather than the triple U+2009 U+200A U+200A (4/18 + 1/18)em rather than (3/18 + 1/18 + 1/18)em.

実体UnderBarが, 合成用の文字U+0332でなく, 間隔取るための文字_に割り当てられました.

The entity UnderBar maps to the spacing character _ rather than the combining character U+0332.

実体OverBarが, 長音記号U+00AFでなく, (XHTMLの実体olineに似た)間隔を取るための文字U+203Eに割り当てられました.

The entity OverBar maps to the spacing character U+203E (like the XHTML entity oline) rather than the macron character U+00AF.

実体epsivとvarepsilonが, 実体epsilon(U+03B5)の別名ではなく, イプシロン記号U+03F5に割り当てられるようになりました.

The entities epsiv and varepsilon are now mapped to the epsilon symbol U+03F5 rather than being aliases for the entity epsilon, U+03B5.

実体phivとvarphiが, 実体phi(U+03C6)の別名ではなく, ファイ記号U+03D5に割り当てられるようになりました.

The entities phiv and varphi are now mapped to the phi symbol U+03D5 rather than being aliases for the entity phi, U+03C6.

B.4 2007年12月14日から2008年7月21日の間の変更点
Changes between 2008-07-21 and 2007-12-14

この文書のドラフト版の次の実体の定義が変更されました.

The following entity definitions have changed at this draft:

phi, lang, rang, OverParenthesis, UnderParenthesis, OverBrace, UnderBrace, lbbrk, rbbrk.

C この仕様書の実体と以前のW3CのDTDとの間の違い
Differences between these entities and earlier W3C DTDs

C.1 XHTML1.0との違い
Differences from XHTML 1.0

XHTMLの実体の定義との違いはここで述べており, XHTMLの実体の集合については, XHTML 1.0 DTDで述べられています.

Differences between the XHTML entity definitions described here and the entity set described in the XHTML 1.0 DTD.

langとrang
lang and rang: U+27E8とU+27E9. XHTML1.0では, (U+3008とU+3009への正当な分解のできる)U+2329とU+232Aが使われていました.
U+27E8 and U+27E9; XHTML 1.0 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).

注意:
Note:

[HTML5]の現在のドラフト版は, この仕様書に由来する実体の定義を用いています.

The current drafts of [HTML5] use entity definitions derived from this specification.

C.2 MathML2.0(第2版)との違い
Differences from MathML 2.0 (second edition)

MathML2との違いと現在の実体の定義は次のとおりです.

The differences between MathML 2 and the current entity definitions are listed below.

fjlig
fjlig: ISOPUB(とMathML1)は, fjの連字と定義していました. ユニコードは, 特定の文字を割り当てておらず, MathML2からこの実体は抜け落ちていました. [SGML]との最大限の互換性を確保するために再び定められました.
ISOPUB (and MathML 1) defined an fj ligature; Unicode does not have a specific character and the entity was dropped from MathML2. It is re-instated here for maximum compatibility with [SGML].
phi
phi: U+03C6 ギリシア語の小文字のファイ(HTML4で使われていた定義). MathML2は, U+03D5 ギリシア文字のファイ記号.
U+03C6 GREEK SMALL LETTER PHI (the definition used in HTML4); MathML2 used U+03D5 GREEK PHI SYMBOL.
epsiv, varepsilon, phiv, varphi
epsiv, varepsilon, phiv, varphi: (varthetaのようなvarを頭に付ける他の利用方法と合わせるため) 記号文字に割り当てられるよう変更されました.
these have been changed to map to the symbol character (to match other uses of the var prefix such as vartheta).
jmath
jmath: U+0237. MathML2では, ユニコード4.1以前は点のないjがなかったため, U+006A(j)を使っていました.
U+0237; MathML 2 used U+006A (j) as there was no dotless j before Unicode 4.1.
trpezium, elinters
trpezium, elinters: U+23E2とU+23E7. ユニコード5.0でこれらの文字が明確に加えられたことから, それ以前のMathML2では, これらの実体へ対応するために, U+FFFD(置換用の文字)が使われていました.
U+23E2 and U+23E7; MathML 2 used U+FFFD (REPLACEMENT CHARACTER) as these characters were added at Unicode 5.0 specifically to support these entities.
ohm, angst
ohm, angst: 前に述べたとおり, これらの実体の定義は変更されたので, 正規化形式Cの正規形にある文字を定義として用いるようにしています.
As noted above, the definitions of these entities have been changed so that the definitions use characters that are in NFC normal form.
bsolhsubとsuphsol
bsolhsub and suphsol: U+27C8とU+27C9. MathML2では, U+005C U+02282とU+2283 U+002Fが使われていました.
U+27C8 and U+27C9; MathML2 used U+005C U+02282 and U+2283 U+002F.
NotGreaterFullEqual
NotGreaterFullEqual: U+2267 U+0338. MathML2では, 間違った定義U+2266 U+0338が使われていました.
U+2267 U+0338 ; MathML2 used the erroneous definition U+2266 U+0338.

次のかっこ記号は, ユニコードのバージョン3.1と5.1の間で, 数学記号の集まりに加えられました. MathML2では, CJK句読点を書くことを意図する似た文字が使われていました.

The following bracket symbols have been added to the Mathematical symbols block in Unicode versions between 3.1 and 5.1. MathML2 used similar characters intended for CJK punctuation.

lang, langle, LeftAngleBracketとrang, rangle, RightAngleBracket
lang, langle, LeftAngleBracket and rang, rangle, RightAngleBracket: U+27E8とU+27E9. MathML2では, (U+3008とU+3009への正当な分解のできる)U+2329とU+232Aが使われていました.
U+27E8 and U+27E9; MathML2 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).
LangとRang
Lang and Rang: U+27EAとU+27EB. MathML2では, U+300AとU+300Bが使われていました.
U+27EA and U+27EB; MathML2 used U+300A and U+300B.
lbbrkとrbbrk
lbbrk and rbbrk: U+2772とU+2773. MathML2では, U+3014とU+3015が使われていました.
U+2772 and U+2773; MathML2 used U+3014 and U+3015.
loangとroang
loang and roang: U+27ECとU+27ED. MathML2では, U+3018とU+3019が使われていました.
U+27EC and U+27ED; MathML2 used U+3018 and U+3019.
lobrkとrobrk
lobrk and robrk: U+27E6とU+27E7. MathML2では, U+301AとU+301Bが使われていました.
U+27E6 and U+27E7; MathML2 used U+301A and U+301B.
OverBraceとUnderBrace
OverBrace and UnderBrace: U+23DEとU+23DF. MathML2では, U+FE37とU+FE38が使われていました.
U+23DE and U+23DF; MathML2 used U+FE37 and U+FE38.
OverParenthesisとUnderParenthesis
OverParenthesis and UnderParenthesis: U+23DCとU+23DD. MathML2では, U+FE35とU+FE36が使われていました.
U+23DC and U+23DD; MathML2 used U+FE35 and U+FE36.
LeftDoubleBracketとRightDoubleBracket
LeftDoubleBracket and RightDoubleBracket: U+27E6とU+27E7. MathML2では, U+301AとU+301Bが使われていました.
U+27E6 and U+27E7; MathML2 used U+301A and U+301B.

注意:
Note:

[MathML3]は, この仕様書で定義された実体の集合を使っており, そのため, MathML3が完成すれば, MathMLとここで定義された実体との違いはなくなります.

[MathML3] uses the entity sets defined by this specification, so there will be no differences between MathML and the entities defined here once MathML3 is finalized.

D ソースファイル
Source Files

実体宣言を構築する全てのデータファイル, XSLTによる文字の割り当て, この文書から参照されるHTMLの表は, http://www.w3.org/2003/entities/2007xml/から利用可能です.

All data files used to construct the entity declarations, XSLT character maps, and HTML tables referenced from this document are available from http://www.w3.org/2003/entities/2007xml/.

unicode.xml 様々な実体の集合やアプリケーションソフトウェアでの名前, TeXでの名前, 他のデータとユニコード文字について詳細に記述したマスターファイル. このファイルは長年にわたって管理されており, 元々は, Sebastian Rahtz氏によりjadetexデストリビューションの一部として, 1999年ごろから, MathML仕様書のソースファイルの一部として, David Carlisle氏によって管理されてきました. ユニコード5.2の全ての文字をデータ化しています.注意: unicode.xmlは5MBを超える大きさであり, ブラウザで直接見るには, 本当のところ適切ではありません. ブラウザで, unicode.xmlの上記のリンクをたどるより, ファイルを保存した方がよいでしょう.
master file detailing all Unicode characters with names in various entity sets and applications, TeX equivalents and other data. This file has been maintained for many years, originally by Sebastian Rahtz as part of the jadetex distribution and since around 1999 as part of the MathML specification sources by David Carlisle. The current version encodes data for all characters in Unicode 5.2. Note: unicode.xml is over 5MB in size and may not really be suitable for direct viewing in a browser. You may prefer to save the file rather than follow the above link to unicode.xml in a browser.
charlist.rnc unicode.xmlに対するrelax NG スキーマ.
relax NG schema for unicode.xml.
unicode.xsl unicode.xmlをHTMLの表として描くためのXSLTスタイルシート.
XSLT stylesheet that renders unicode.xml as an HTML table.
character-set.xml この文書のソースファイル.
the source file for this document.
xmlspec.xsl 標準のxmlspecスタイルシートのコピー.
a copy of the standard xmlspec stylesheet.
run この文書の集合を作るための小さなスクリプトファイル.
small script file that builds this collection.
xhtml1.xml XHTML1.0実体定義の記録.
record of XHTML 1.0 entity definitions.
mml2.xml MathML2.0(第2版)実体定義の記録.
record of MathML 2.0 (second edition) entity definitions.
unicodedata.xsl unicode.xmlの新しいコピーを作るためのスクリプトで, ユニコードのデータファイルからデータを取り込むもので, ユニコードの新しいバージョンが出たときに, unicode.xmlを更新するのに利用されます.
stylesheet that generates a new copy of unicode.xml, incorporating data from the Unicode data file, used to update unicode.xml as new versions of Unicode are released.
entities.xsl 実体に対するDTD宣言を作成するスタイルシート.
stylesheet to generate the DTD declarations for the entities.
charmap.xsl XSLTによる文字の割り当てを作成するスタイルシート.
stylesheet to generate the XSLT character maps.
characters.xsl 参照付きのHTMLの表を含むこの文書を作成するスクリプト.
stylesheet to generate this document, including the referenced HTML tables.
schemas.xml XML文書と適切なRelax NG スキーマを関連付けるスタイルシート.
file associating XML documents with appropriate Relax NG schema.
catalog http://www.w3.org/2003/entities/2007/にある実体やスタイルシートを, /etc/xml/w3c-entitiesにあるローカルファイルシステム上のものへ切り替えるOASIS XML カタログファイルの例. このファイルは, ローカルにあるファイルのコピーの場所を参照するように編集する必要があります. たくさんのXML処理プログラムがこのカタログ形式を読み込むよう設定されているかも知れませんが, 特定の機能は, 使用している処理プログラムに依存します.
Sample OASIS XML catalog that redirects references to the entity or stylesheet files at http://www.w3.org/2003/entities/2007/ to the local file system at /etc/xml/w3c-entities. It should be edited to refer to the location of a local copy of the files. Many XML parsers may be configured to read this catalog format, but the specific options depend on the parser being used.

E 参考文献
References

SGML: ISO/IEC 8879:1986, Information processing — Text and office systems — Standard Generalized Markup Language (SGML)
ISO9573-13-1991: ISO/IEC TR :1991, Information technology — SGML support facilities Techniques for using SGML — Part 13: Public entity sets for mathematics and science
Unicode: The Unicode Consortium. The Unicode Standard, Version 5.2.0, defined by: The Unicode Standard, Version 5.2 (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9). (http://www.unicode.org/versions/Unicode5.2.0/)
Unicode15: Unicode Standard Annex 15, Version 5.2.0; Unicode Normalization Forms, The Unicode Consortium, 2009-09-03. (http://www.unicode.org/reports/tr15/tr15-31.html)
Unicode25: Barbara Beeton, Asmus Freytag, Murray Sargent III, Unicode Support for Mathematics, Unicode Technical Report #25 2008-08-14. (http://www.unicode.org/unicode/reports/tr25/)
MathML2: David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier, Mathematical Markup Language (MathML) Version 2.0 (Second Edition) W3C Recommendation 21 October 2003 (http://www.w3.org/TR/2003/REC-MathML2-20031021/)
MathML3: David Carlisle, Patrick Ion, Robert Miner, Mathematical Markup Language (MathML) Version 3.0 W3C Candidate Recommendation 15 December 2009 (http://www.w3.org/TR/2009/CR-MathML3-20091215/)
HTML4: Dave Raggett, Arnaud Le Hors, Ian Jacobs, HTML 4.01 Specification W3C Recommendation 24 December 1999 (http://www.w3.org/TR/1999/REC-html401-19991224)
HTML5: Ian Hickson, David Hyat, HTML 5, A vocabulary and associated APIs for HTML and XHTML W3C Working Draft 25 August 2009 (http://www.w3.org/TR/html5/)
Charmod-norm: François Yergeau, Martin J. Dürst, Richard Ishida, Addison Phillips, Misha Wolf, Tex Texin, Character Model for the World Wide Web 1.0: Normalization W3C Working Draft 27 October 2005 (http://www.w3.org/TR/charmod-norm/)

文字に対するXML実体の定義
XML Entity Definitions for Characters

W3C勧告 2010年4月1日
W3C Recommendation 01 April 2010

概要
Abstract

この文書の位置付け
Status of this Document

目次
Table of Contents