ReadingUnicodeChars

Overview

This specification details the processes of reading Unicode characters into CLforJava. There are 2 possible methods for ingesting Unicode characters:

TheReader

The Lisp Reader has 2 methods defined for entering characters. The first is the simple act of reading a character from the input stream. This will work correctly for all Unicode characters in the Basic Multilingual Plane (BMP) - those chars whose code points fit into 16 bits. The second is the #\ reader macro. This reader macro reads the next element in the stream an interprets it as the name of a character. For those characters that are a single character, the character is just quoted. For results longer than 1 character, the name is the index into a table of Unicode characters. Every Unicode character has a unique name, sometimes there are multiple names for a character.

The second method is to provide an additional reader macro that lets the user specify the UnicodeCodePoint? of the character. This involves appropriating one of the unspecified # dispatch characters. In this case, we use the u or U characters. There are 2 variants of the syntax:

  1. #uXXXX - where the 'u' is followed by 4 hexadecimal characters. This variant can specify all characters from the BMP.
  2. #u+XXXXXX - where the '+' is followed by 2 to 6 hexadecimal characters. This variant can specify all code points in the Unicode repertoire. Note that code points above FFFF are implemented in Java as an array of 2 characters (char[2]).

Specification of the #\ Reader Macro

TBS

Specification of the #u/#u+ ReaderMacro?

TBS

FileSystem?

References

The Unicode Organization has detailed information on Unicode characters and how to manipulate them.

HyperSpec CLtL
Character Chapter Character Data Type

Implementation

Details of implementation

Core Java Classes Javadoc Links

Discussions

Links to Blog issues

Current Status of ReadingUnicodeChars

Status:

Release Level:

Open bug count:

Test Suites

Links to JUnit results

-- JerryBoetje - 12 Jul 2003

Topic revision: r3 - 2009-02-11 - 18:52:38 - MeganLusher
 
Home
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback