The Revised Maclisp Manual | Page A-11 | ||||||
|
|
Characters | Concept | Representation Issues |
Maclisp has no primitive datatype called a character object as would some other language. When we speak of characters in Maclisp, we always mean one of two things:
Because of various kinds of computational overhead incurred in using symbols as characters, users are encouraged to use the fixnum representation of characters wherever possible. The main reason symbols are ever chosen is for visual effect in debugging, but this is generally not regarded as a reasonable reason for using them.
Another disadvantage of using symbols is that when multiple obarrays are in use, since the conversion of fixnums to symbols typically calls INTERN, the potential for the symbol accidentally coming out on the wrong obarray is high. Since multiple obarray errors are among the hardest to debug, use of fixnums as a character representation is again preferred over symbols.
Hence, while functions for manipulating both representations exist, the user is strongly encouraged to use, for example, TYI rather than READCH, EXPLODEN rather than EXPLODEC, GETCHARN rather than GETCHAR, etc.
Further, it is important to transportability of code that reliance on the fixnumness or on the actual codes be minimized. This is because ASCII is not the only available encoding scheme and your code may someday want to transport to other Lisps where the character codes are different or in some cases where characters are not even fixnums any more, but rather a datatype in their own right.
Rather than ever write 101 (octal) or 65. decimal to mean the letter "A", one should write #/A instead. The character sequence #/ may be followed by any character and will be interpreted by READ identically to having the typed fixnum whose magnitude was the ASCII code for that character. Note also that #/ is case-sensitive; i.e., #/A is different than #/a.
Some characters are hard to type into an interactive Lisp or look very poor after a #/ sequence. These include graphic characters such as Space and Return and characters which have interrupt effects, such as Control-G. Two mechanisms exist for handling these. #^ (sharpsign uparrow) followed by a character has the same effect as typing #/ followed by the control character. For example, the code for Control-G can be written as #^G. Also, #\ can be followed by the symbolic name for a character. Common usages are things like #\SPACE, #\TAB, #\RETURN, #\LINE, and #\RUBOUT which mean the obvious thing (i.e., the ASCII codes for Space, Tab, etc.). To find the preferred input format for a character, c, use
(FORMAT T "~@C" c).
Ascii Encoding/Decoding |
ASCII | Function | (ASCII i) |
Returns the one-letter, interned atomic symbol for which (GETCHARN symbol i) would return the given fixnum.
Examples:
(ASCII #o101) => A ;discouraged input syntax
(ASCII #/A) => A ;preferred input syntax
(ASCII #\SPACE) => ||
GETCHAR | Function | (GETCHAR sym i) |
Returns the ith character of sym's pname, 1-based; i.e., (GETCHAR sym 1) selects sym's first character. The character is returned as a symbol. NIL is returned if i is out of bounds. sym must be a symbol (or "fake string"); i must be a fixnum.
Use of this primitive is stylistically discouraged. GETCHARN is preferred in cases where a choice exists.
Note: The character position is that in the actual internal representation (the one shown by PRINC), not the position shown in READable displays (using PRIN1).
Examples:
(GETCHAR "FOO BAR" 4) => ||
(GETCHAR "FOO" 1) => F
(GETCHAR "0123456789" (1+ 3)) => /3
(GETCHAR 'A/bC/d 3) => C
(GETCHAR '|/|+//X*| 4) => X
GETCHARN | Function | (GETCHARN sym i) |
Returns a numeric value (fixnum) representing the ith character in the printname of sym, 1-based; i.e., (GETCHARN sym 1) returns the first character. Character positions less than 1 are undefined; character positions greater than the length of sym's printname (see FLATC) return 0. sym must be a symbol (or "fake string"); i must be a fixnum.
Note: The character position is that in the actual internal representation (the one shown by PRINC), not the position shown in READable displays (using PRIN1).
Examples:
(GETCHARN "FOO BAR" 4) => 32. ; #\SPACE
(GETCHARN "FOO" 1) => 70. ; #/F
(GETCHARN "0123456789" (1+ 3)) => 51. ; #/3
(GETCHARN 'A/bC/d 3) => 67. ; #/C
(GETCHARN '|/|+//.*| 4) => 46. ; #/.
Web-Only Note:
In the hardcopy manual, the above examples were shown as returning characters, but since characters are just integers in MACLISP, these examples have been amended to show them as base-10 integers with the interpretation of those integers as a comment.
The EXPLODE Family |
EXPLODEN | Function | (EXPLODEN object) |
Returns a list of characters, represented as fixnums, which are the characters that would have been typed out if (PRINC object) was done; i.e., slashes for special characters are not included in the list of characters. EXPLODEN is sensitive to most of the printer control switches such as the variables BASE and *NOPOINT. It is not sensitive to the settings of the variables PRINLEVEL, PRINLENGTH, or TERPRI. It is as if PRINLEVEL and PRINLENGTH are bound to NIL, and atoms never have spurious carriage returns inserted into them as they might in an actual PRINC.
Example:
(EXPLODEN '(+ 1X 3)) => (50 530 40 61 170 40 63 51) ;base 8
EXPLODEC | Function | (EXPLODEC object) |
Returns a list of characters, represented as symbols, which are the characters that would have been typed out if (PRINC object) was done; i.e., slashes for special characters are not included in the list of characters. EXPLODEC is sensitive to most of the printer control switches such as the variables BASE and *NOPOINT. It is not sensitive to the settings of the variables PRINLEVEL, PRINLENGTH, or TERPRI. It is as if PRINLEVEL and PRINLENGTH are bound to NIL, and atoms never have spurious carriage returns inserted into them as they might in an actual PRINC. Because it must intern the resulting symbols, it is also sensitive to the setting of the variable OBARRAY.
Example:
(EXPLODEC '(+ 1X 3)) => ( /( + || /1 X || /3 /) )
EXPLODEC | Style Note | EXPLODEN vs EXPLODEC |
This primitive is more expensive than EXPLODEN because it must intern the resulting characters. Also, designers of sophisticated systems using multiple obarrays should note that EXPLODEC'd symbols are on the active obarray, which isn't always the obarray of the code making the call to EXPLODEC. For these reasons, EXPLODEN is usually preferred over EXPLODEC.
EXPLODE | Function | (EXPLODE object) |
Returns a list of characters, represented as symbols, which are the characters that would have been typed out if (PRIN1 object) had been done, including slashes for special characters.
EXPLODE is sensitive to most of the printer control switches such as the variables BASE and *NOPOINT. It is not sensitive to the settings of the variables PRINLEVEL, PRINLENGTH, or TERPRI. It is as if PRINLEVEL and PRINLENGTH are bound to NIL, and atoms never have spurious carriage returns inserted into them as they might in an actual PRIN1. Because it must intern the resulting symbols, it is also sensitive to the setting of the variable OBARRAY.
There is no analogous function which produces fixnums instead of symbols as a result. When using multiple obarrays, observe caution; see the style note about EXPLODEC on the previous page for more information.
Examples:
(EXPLODE 'ABC) => (A B C)
(EXPLODE 'abc) => (A B C)
(EXPLODE "abc") => (/" /a /b /c /")
(EXPLODE '|abc|) => (/| /a /b /c /|)
(EXPLODE '(+ 1X 3)) => ( /( + || // /1 X || /3 /) )
IMPLODE | Function | (IMPLODE charlist) |
charlist is a list of characters to be used in the creation of a new symbol. The characters may be either symbols (if symbols are more than one character, only the first character will by used) or fixnums in the range of 0-127 representing the ASCII value of a character. The return value is an interned symbol whose printname is the sequence of characters given as an argument.
If you want an un-interned symbol, try the function MAKNAM.
Note: IMPLODE is not the inverse of EXPLODE. The approximate inverse to EXPLODE is READLIST. What IMPLODE does is most related to an inverse EXPLODEN or EXPLODEC, but differs in too many ways to be usefully thought of as such.
Examples:
(EQ 'ABC (IMPLODE '(A B C))) => T
(IMPLODE '(A #\SPACE #/B)) => |A B|
(IMPLODE (EXPLODE 3)) => /3
(NUMBERP (IMPLODE (EXPLODE 3))) => NIL
|
The Revised Maclisp Manual (Sunday Morning Edition) Published Sunday, December 16, 2007 06:17am EST, and updated Sunday, July 6, 2008. |
|