Overview
The
CommonLisp system provides an extensible subsystem (
TheReader) that maps strings of characters into various Lisp objects. The supported extension method is the use of a
ReadTable that supports redefining characters as
MacroCharacters? that trigger the evaluation of a Lisp function by
TheReader. There is however an underlying mechanism defined in the
CommonLisp specification that drives the basic state machine in
TheReader. This is the dual concepts of
CharacterAttributes? and
SyntaxType? . While the specification description is very detailed and encapsulated, there is no mechanism for directly affecting the mapping of a character to a specific
CharacterAttributes? and
SyntaxType? .
Aside from giving a slight nod to the existence of other character schemes, the existing specification deals mostly in the lower 127 characters of the ASCII code. However,
CLforJava is designed to encompass the entire Unicode system (see
CharacterSystem). There are no specific definitions in the Lisp specification as to the Attiribute or Syntax Type of the vast number of characters. Since we could no longer use simplistic methods such as an array lookup (there are approximately 15,100 characters defined for
CLforJava), the project required a
CharacterSystem that could handle the large set of characters and have very encapsulated design. Having done that, we decided to expose this mechanism to both Java and Lisp programmers as an "official" extension to
CommonLisp.
References
Implementation
The first implementation of
TheReader included nested classes that defined
CharacterAttribute? and
SyntaxType? . In the new version, these classes are extracted to be extension classes with a public API. Furthermore, they are
not defined as
final allowing Java programmers to extend these classes to handle parsing of radically different languages while still using the underlying reader and
ReadTable mechanisms.
Both classes extend from the Java 1.4 Character subsetting mechanism, providing a standard
of method. Each of them also provides a set of type-safe
enum constants for attributes and syntax type.
Signatures
public class lisp.extension.character.Attribute extends java.lang.Character.Subset {
public static final lisp.extension.character.Attribute CONSTITUENT = new lisp.extension.character.Attribute("CONSTITUENT");
... etc for the rest of the attributes ...
public static lisp.extension.character.Attribute of(lisp.common.type.Character character);
private lisp.extension.character.Attribute(String name) {}
}
public class lisp.extension.character.SyntaxType extends java.lang.Character.Subset {
public static final lisp.extension.character. SyntaxType ALPHADIGIT = new lisp.extension.character. SyntaxType("ALPHADIGIT");
... etc for the rest of the attributes ...
public static lisp.extension.character. SyntaxType of(lisp.common.type.Character character);
private lisp.extension.character. SyntaxType(String name) {}
}
TBD