Template:Unichar/doc
This template produces a formatted description of a Unicode character, to be used in-line with regular text.
-
The character <code>{{unichar|a9|Copyright sign}}
is about intellectual property.</code> →
- The character Expression error: Unrecognised punctuation character "[". is about intellectual property.
Contents
Usage
This template produces a formatted description of a Unicode character, to be used in-line with regular text. It needs Unicode hexadecimal value and name as input, like {{unichar|00A9|Copyright sign}}
→ Expression error: Unrecognised punctuation character "["..
It follows the standard Unicode presentation of a character such as using the pattern "U+00A9", and allowing Unicode's all caps naming habit (which is undesired in Wikipedia text) by using small caps. The hexadecimal value is required (e.g. A9), other input is optional. The actual glyph is looked for in a font that covers Unicode quite well, and this can be set more specific to e.g. language or IPA-specialized fonts. To show the glyph, the font character can be overridden with an image. A wikilink to an article and another to Unicode can be created, it is possible to add (in brackets), the calculated decimal value, HTML-notations, and a note.
Some special code points are given extra care, like controls and space characters. These special code points are automatically detected by the unichar/gc sub-template.
Examples
-
{{unichar|00A9|Copyright sign}}
→ Expression error: Unrecognised punctuation character "[". -
{{unichar|00A9|Copyright sign|nlink=Copyright symbol}}
→ Expression error: Unrecognised punctuation character "[". -
{{unichar|00A9|Copyright sign|nlink=Copyright symbol|note=See also [[Copyleft]] symbol}}
→ Expression error: Unrecognised punctuation character "[". -
{{unichar|00A9|Copyright sign|nlink=Copyright symbol|dec=|html=}}
→ Expression error: Unrecognised punctuation character "[". -
{{unichar|00A0|No-break space|note=NBSP}}
→ Expression error: Unrecognised punctuation character "[". -
{{unichar|0007|nlink=Bell character}}
→ Expression error: Unrecognised punctuation character "[".
Parameters
The blank template, with all parameters, is as follows:
{{unichar | <!-- hex value, code point (do not add the "U+") --> | <!-- Unicode name --> | ulink = | image = | cwith = | size = | use = | use2 = | nlink = | dec = | html = | note = }}
Inline version:
{{unichar| <!-- hex value (do not add "U+") -->| <!-- Unicode name -->|ulink= |image= |cwith= |size= |use= |use2= |nlink= |dec= |html= |note= }}
- First parameter, 1= Unnamed, required (prefix "1=" may be omitted). The hexadecimal value of the code point, e.g.
.00A9
- Notes: The parameter accepts input like
,A9
anda9
as hexadecimal value. Decimal values are not detected being decimal, and will give unexpected results (see Error messages, below).00A9
- Second parameter, 2= Unnamed (prefix "2=" may be omitted). The Unicode name of the character. The template uses this input as the name in small caps, forced into these small capitals whatever the input. This name may differ from the name of the corresponding Wikipedia article (see below: nlink=).
- nlink= Optional.wikilink. Name of the Wikipedia page that will be linked to. If used, the Unicode name (second parameter) has a wikilink to the article. When the article name and the Unicode name are the same, using a straight "nlink=" without a name will do.
- For control characters using the nlink parameter, the Unicode name parameter is not used: the nlink parameter is displayed instead, without small caps.
- Note: the name of the page is case sensitive as with all wikipages.
{{unichar|00A9|Copyright sign|nlink=Copyright symbol}}
→ Expression error: Unrecognised punctuation character "[".{{unichar|00A3|Pound sign|nlink=}}
→ Expression error: Unrecognised punctuation character "["..
- ulink Optional. Creates a wikilink in the "U+" prefix. When used without name (
), the default value, Unicode page, is used: U+.{{{1}}}
- dec= Optional. Adds the decimal value to the text, in the bracketed note.
- html= Optional. Adds the HTML character reference to the text, like
 
in the bracketed note. If a named character reference exists, like" "
, that is added too.
- use= Optional. Sets the font-hinting template to get the glyph, since the character may not be present in a regular browser font. Default is {{unicode}}, other options are {{IPA}}, {{lang}} and {{script}}. When setting "use=lang" or "use=script", use2 should be used to set the language ("use2=fr") or the script ("use2=Cyrs"). A glyph may still not show as expected due to browser effects. For a detailed description, see the templates documentation.
-
{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=|use=script|use2=Cyrs}}
→ Expression error: Unrecognised punctuation character "[".
- image= Optional. Allows for a file (graphic image) to represent the glyph, overrules the font completely. The filename should include the extension like ".svg", but not the prefix "File:".
- cwith= Optional. Useful when the Unicode character is combining. Using
adds a space before the character, allowing the combining effect. So when used with a character likecwith=
, the character will be combined with the letter "a". In Unicode, a general glyph used to place a combined character is Expression error: Unrecognised punctuation character "["..cwith=a
- without cwith=:
{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA}}
→ Expression error: Unrecognised punctuation character "[".
- cwith= without parameter:
{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=}}
→ Expression error: Unrecognised punctuation character "[".
- cwith= with dotted circle:
{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=◌}}
→ Expression error: Unrecognised punctuation character "[".
- size= Optional. Can be used to set the size of the glyph. By default "size=125%". For the font, all font-size style inputs are accepted: "7px", "150%", "2em", "larger".
{{unichar|0041|LATIN CAPITAL LETTER A|size=2em}}
→ Expression error: Unrecognised punctuation character "[".- When using an image (file) instead of a font, this size can only accept sizes in px like "12px". Default for images is "10px" .
{{unichar | A9 | Copyright sign | ulink = Universal Character Set characters | image = | size = 150% | nlink = Copyright symbol | note = Example }}Produces:
- Expression error: Unrecognised punctuation character "[".
Presentation effects
Since this template is aimed at presenting a formatted, inline description, some effects are introduced to sustain this target.
- Showing space characters: All space characters (those with General Category: Zs) are presented with a light-blue background, to show their actual presence and width:
Expression error: Unrecognised punctuation character "[".
.
- Incidentally, the regular space is replaced with
�A0;
(NBSP) to prevent wiki-markup deleting it as repeated spaces.
- Removing formatting characters: Formatting characters (those with General Category: Cf, Zl and Zp) are removed from the output. By definition, formatting characters have no glyph. By removing them they cannot have a formatting effect.
Exception: five Arabic Cf/formatting number markings U+0600..U+0603 and U+60DD, are shown. While Cf formatting characters usually have no glyph, these five have. By internally adding "(visible)" to the category, these characters are shown.
- Removing whitespace: The template removes formatting code and surrounding whitespace from the input. A <Return> in the Name-input (possibly unintended) would frustrate the in-line behaviour expectation.
- Showing a label like <control-0007>: Unicode states, that an code point has no name when it is one of these: a control character, a private use character, a surrogate, a not assigned code point (reserved), or a non-character. These code points instead should be referred to by using a "Code Point Label", such as <private-use> or <private-use-E000>. In this situation, this template replaces the glyph with that label. This way, the correct presentation wins it over Unicode-usage to the letter of the law.
- "Control" general category=Cc:
<control>
or<control-0007>
- "Surrogate" general category=Cs:
<surrogate>
or<surrogate-D800>
- "Private Use": general category=Co:
<private-use>
or<private-use-FFA0>
- "Not a character" (minus the reserved code points, see below): general category=Cn:
<not-a-character>
,<non-character>
or<not-a-character-FFA0>
The second parameter (Unicode name) is not presented, since it cannot exist. It is possible to create a link to an article.
- Note: A <reserved> (unassigned) code point cannot be detected yet, and so is not presented with this label. These code points too are given Cn category.
- (Background on <>-labels: A Name can never have <>-brackets at all. These rules prevent mixing up a name with an actual control-character. So it will not happen that a bell rings when a page is opened that contains a Name of U+0007).
Possible errors
- The template produces an {{#invoke:Error|error|Error-message|tag=span}} when parameter #1 (hex value) is missing.
- A non-hexadecimal input like 00G9 produces an error (Because G/g is not hexadecimal).
- Do not add the U+-prefix like U+00A9. It will not be recognised.
- If the template only shows the code point number, like "2038", you're probably using the wrong template {{unicode}}, instead of {{unichar}}.
- The glyph may be overruled and changed into a label like <control-0007>. These characters have no Unicode name. An nlink will be directly to the article (entered in "nlink=Bell signal"). A blank like this "nlink=", (which links fine for regular characters like Pound sign), cannot work for <labeled> characters (there is no character name at all to make into a link). This produces an error.
- A decimal value input like 1=98 will be read as being hexadecimal value 0098. There is no way that the template can detect you intended to enter 9810=6216.No warning is issued, and the wrong character, U+009816, will be shown (not U+0062).
Technical notes
The word "unichar" is used only in this English Wikipedia, as a name for this template. It has no meaning outside.
The template uses these subtemplates:
- {{unichar/main}} Accepts all the input from
{{unichar}}
. Calls several subtemplates to produce the textstrings, and then strings them together. Also checks for the error non-hex input. - {{unichar/ulink}} creates a piped link for the "U+" prefix.
- {{unichar/gc}} determines the Unicode general category, when this category is special (like, for control characters).
- {{unichar/glyph}} for rendering the glyph by font. Accepts "image=" that overrules the font. Uses also "use", "use2", "size", "cwith".
- {{unichar/na}} Produces the formatted name of the character in smallcaps. Accepts the "nlink=" to create a piped wikilink to an article. When the general category (gc) is special, the name will change into a <label-hhhh>.
- {{unichar/notes}} - Produces the three optional notes in brackets: decimal value, HTML-character reference (both decimal and by name like
if that exists using&nbsp;
{{numcr2namecr}}
) and the free text note as provided by the editor. Also does the brackets themselves.
- Using the main template as an input-easy feature, there are few calculations done (actually only two hex2dec), and allows for adding default values not too deep in the templates.
- The value "<#salted#>" is used internally to pass through a non-defined input parameter. This value is correct when about the Name, because a Unicode name cannot have the characters <##>, and so salted is the right word (meaning uninhibitable). For ease of code maintenance it is used in more places around.
Issues
- Unassigned code points, to be labelled <reserved>, cannot be detected.
- When using use-script, use2 needs lowercase (e.g. 0485, Cyrs or cyrs)
- When using for one off the RTL formatting marks, its effect may break out of the template (text following goed rtl too). As it is now, this requires extra code.
See also
- {{unicode}} - Produces rare characters, using fonts that cover Unicode more widely.
External links
Useful links for researching Unicode characters:
- Unicode charts, gives the chart (in PDF) on which the U+value is located.
- Fileformat.com search, to search by name (whole or partial), by U+ hex value or decimal value, by the font symbol (copy-past it). Extra information per character. One character only.
- [1] a multi-character converter.