The Pitmanual: Declarations and the Compiler

Compiler


Compilation	Concept	`Efficiency`

LISP programs can be compiled into machine code. This representation of a program is more compact than the interpreted list-structure representation, and it can be executed much more quickly. However, a price must be paid for these benefits. It is not as easy to intervene in the execution of compiled programs as it is with interpreted programs. Thus most LISP programs should not be compiled until after they have been debugged.

In addition, not all LISP programs can be compiled. There are certain things which can be done with the interpreter that cannot be effectively compiled. These include indiscriminate use of the functions EVAL and APPLY, especially with pdl-pointer arguments; “nonlocal” use of the GO and RETURN functions; and functions which modify their own code. Also, there are a number of functions which detect illegal arguments when they are called interpretively but not when a call to them is compiled; erroneous compiled programs can therefore damage the LISP environment and can cause strange errors to occur -- be forewarned. However, most “normal” programs are compilable.

Some operations are compiled in such a way that they will behave somewhat differently from the way they did when they were interpreted. It is sometimes necessary to make a “declaration” in order to obtain the desired behavior (see DECLARE).


Variables	Concept	`Compiled Variable References`

In the interpreter “variables” are implemented as atomic symbols which possess “shallow-bound” value cells. The continual manipulation of value cells would decrease the efficiency of compiled code, so the compiler defines two types of variables: “special variables” and “local variables.” Special variables are identical to variables in the interpreter.

Local variables are more like the variables in commonly-used algebraic programming languages such as Algol or PL/I. A local variable no longer has an associated atomic symbol; thus it can only be referred to from the text of the same function that bound it. The compiler creates local variables for PROG variables, DO variables, and LAMBDA variables, unless directed otherwise. The compiled code stores local variables in machine registers or in locations within a stack.

The principal difference between local variables and special variables is in the way a binding of a variable is compiled. (A binding has to be compiled when a PROG, DO, or LAMBDA expression is compiled, and for the entry to a function which has LAMBDA variables to be bound to its arguments.) If the variable to be bound has been declared to be special, the binding is compiled as code to imitate the way the interpreter binds variables: the value of the atomic symbol is saved and a new value is stored into its value cell. If the variable to be bound has not been declared special, the binding is compiled as the declaration of a new local variable. Code is generated to store the value to which the variable is to be bound into the register or stack-location assigned to the new local variable. This runs considerably faster than a special binding.

Although a special variable is associated with an atomic symbol which has the name of the variable, the name of a local variable appears only in the input file---in compiled code there is no connection between local variables and atomic symbols. Because this is so, a local variable in one function may not be used as a “free variable” in another function since there is no way for the location of the variable to be communicated between the two functions.

When the usage of a variable in a program to be compiled does not conform to these rules, i.e. it is somewhere used as a “free variable,” the variable must be declared special. There are two common cases in which this occurs. One is where a “global” variable is being used, i.e. a variable which is SETQ'ed by many functions but is never bound. The other is where two functions cooperate, one binding a variable and then calling the other one which uses that variable as a free variable.

Lisp's control variables (e.g., IBASE, BASE, ^R, ^W, ERRSET, ...) are automatically assumed by the compiler to be special unless declared otherwise. This exception to the general rule is predicated on the assumption that when the user sets, for example, the value of IBASE, he intends to affect the operation of the READ function.


Inline-Coding	Concept	`Open-Compiled Function Calls`

Another difference between the compiler and the interpreter is “in-line coding,” also called “open coding.” When a form such as (AND (FOO X) (BAR)) is evaluated by the interpreter, the built-in function AND is called and it performs the desired operation. But to compile this form as a call to the function AND with list-structured arguments derived from (FOO X) and (BAR) would negate much of the advantage of compiling. Instead the compiler recognizes AND as part of the LISP language and compiles machine code to carry out the intent of (AND (FOO X) (BAR)) without actually calling the AND function. This code might look like:

Pick up value of variable X
Call function FOO
Is the result NIL?
If yes, the value of the AND is NIL
If no, call the function BAR
The result of the AND is what BAR returned.

This “in-line coding” is done for all special forms (COND, PROG, AND, ERRSET, SETQ, etc.); thus compiled code will usually not call any of the built-in fsubrs.

Another difference between the compiler and the interpreter has to do with arithmetic operations. Most computers on which MACLISP is implemented have special instructions for performing all the common arithmetic operations. The MACLISP compiler contains a “number compiler” feature which allows the LISP arithmetic functions to be “in-line coded” using these instructions.

A problem arises here because of the generality of the MACLISP arithmetic functions, such as PLUS, which are equally at home with fixnums, flonums, and bignums. Most present-day computers are not this versatile in their arithmetic instructions, which would preclude open-coding of PLUS. There are several ways out of this problem. One is to use the special purpose functions which only work with one kind of number. For example, if you are using PLUS but actually you are only working with fixnums, use + instead. The compiler can compile (+ A B C) to use the machine's fixnum-addition instruction. A second solution is to write (PLUS A B (FOO C)) but tell the compiler that the values of the variables A and B, and the result of the function FOO can never be anything but fixnums. This is done by means of the “number declarations” (see DECLARE). A third way is to use the FIXSW and FLOSW declarations. Note that when interpreted (PLUS A B (FOO C)) can legitimately produce a bignum even though all three numbers added are fixnums, but the open-compiled code will not check for overflow and will simply lose high-order bits in such cases. This is true no matter how you cause the expression to be open-coded.

Another problem that can arise in connection with the in-line coding of arithmetic operations is that the LISP representation of numbers and the machine representation of numbers may not be the same. Of course, this depends on the particular implementation. If these two representations are different, the compiler would store variables which were local and declared to be numeric-only in the machine form rather than the LISP form. This could result in compilation of poor code which frequently converts number representations and in various other problems. Compilers which have this problem provide a (CLOSED T) declaration which inhibits open coding of arithmetic operations.


Function Calling	Concept	`Closed-Compiled Function Calls`

Another property of compiled code that should be understood is the way functions are called. In the interpreter function calling consists of searching the property list of the called function for a functional property (if it is an atomic symbol) and then recursively evaluating the body of the function if it is an expr, or transferring control to the function if it is a subr. In compiled code function calling is designed according to the belief that most of the functions called by compiled code will be machine executable, i.e. “subrs”: other compiled functions, or built-in functions, and only infrequently will compiled code call an interpreted function. Therefore a calling mechanism is used which provides for efficient transfer between machine-executable functions without constant searching of property lists. This mechanism is called the “uuo link” mechanism for historical reasons.

When a compiled function is first loaded into the environment, it has a uuo link for each function it will call. This uuo link contains information indicating that it is “unsnapped” and giving the name of the function to be called, which is an atomic symbol. The first time a call is made through such a uuo link, the fact that it is “unsnapped” is recognized and a special linking routine is entered. This routine searches the property list of the function to be called, looking for a functional property (or an AUTOLOAD property) in just the same way as the interpreter would. If the function turns out to be an expr, or is undefined, the interpreter is used to apply the function and the result is given back to the compiled code. The link is left “unsnapped” so that every time this function is called the interpreter will be invoked to interpret its definition. If, however, the function being called is machine executable (a subr), the link is “snapped.” Exactly what this means is implementation dependent but the effect is that from then on whenever a call is made through this uuo link, control will be transferred directly to the called function using the subroutine-calling instruction of the machine, and neither the linking routine nor the interpreter will be called.

There is a flag which can be set so that links will not be snapped even if they go to a function which is machine executable. This flag is the value of the atomic symbol NOUUO (see UUOLINKS) There is also a function, (SSTATUS UUOLINKS), which unsnaps all the links in the environment. These facilities are used in circumstances such as when a compiled function is redefined, or compiled code is being traced or otherwise debugged.

In the PDP-10 implementation a uuo link is implemented as an instruction which is executed when a call is to be made through the link. An “unsnapped” link consists of a special instruction, “UUO,” which causes the LISP linking routine in the interpreter to be called. The address field of the uuo points to the atomic symbol which names the function to be called. The operation code and accumulator fields indicate the type of call and number of arguments. When the link is snapped the UUO instruction is replaced with a “PUSHJ” instruction, which is the machine instruction for calling subroutines.

In the Multics implementation, a uuo link is implemented as a pointer. To call through this link a “tspbp” instruction indirect through the pointer is used. An unsnapped link points at the linking subroutine and various fields in the pointer, left unused by the machine, indicate the type of call, number of arguments, and the atomic symbol which names the function. When the link is snapped the pointer is changed to point at the first instruction of the called function.

Before a function can be used it must be made known in the LISP environment. Interpreted functions are made known simply by putting a functional property on the property list of the atomic symbol which names the function. This is usually done using the built-in function DEFUN. Compiled functions must be made known by a more complex mechanism known as “loading,” because of the complexity of the support mechanisms needed to make compiled functions execute efficiently. In some dialects of LISP the compiler automatically makes the compiled functions known, but in MACLISP the compiler creates a file in the file system of the host operating system, and this file has to be loaded before the compiled function can be called. In the PDP-10 implementation this file is called a “fasl file.” In the Multics implementation it is called an “object segment.”

See also: LOAD, FASLOAD


Compiler Input	Concept	`Code to be Compiled`

The input to the compiler consists of a text file containing a number of S-expressions. The format of this file is such that it could be read into a LISP environment using a function such as LOAD or UREAD, and then the functions defined in this file would be executed interpretively.

When a file is compiled, the compiler reads successive S-expressions from the file and processes them. Each is classified as a function definition, a “declare-form,” a “macro-form,” or a “random form” according to what type of object it is and according to its car if it is a list.

A function definition is a form whose car is one of the atoms DEFUN, DEFPROP. If the definition is for a MACRO, the macro definition becomes defined for use at compile time. It may also be output from the compiler; for more information, see the description of DEFUN. If it defines a function other than a macro, the compiler translates the definition from LISP to machine code and outputs it into the “fasl file” or “object segment” which is the output from the compiler. If it is a DEFPROP but defines a property other than EXPR, FEXPR, or MACRO, it is treated as a random form.

A macro form is any form whose car has previously been defined to be a macro. When a macro form is read from the input file, the compiler will apply the macro and then process the result as if it had been read from the input file. Thus if FOO is a macro which expands (FOO A B C) into (DEFUN A ...), the resulting function definition will be compiled.

A declare-form is a form whose car is the atom DECLARE. It is ignored by the interpreter because there is an fsubr called DECLARE in MACLISP which does nothing.

A PROGN form is any form which has the form (PROGN 'COMPILE...). The compiler processes each of the remaining elements of the progn form as if it had been encountered at top level in the file. PROGN forms are useful for macros (and read macro characters) which are to expand into a function definition and a declare form. For example, one might define a macro DEFLOAT such that (DEFLOAT F (A B) (PLUS A B)) expanded into

(PROGN 'COMPILE
       (DECLARE (FLONUM (F FLONUM FLONUM)))
       (DEFUN F (A B) (PLUS A B)))

Note that the (PROGN 'COMPILE ...) will be processed correctly by the interpreter also.

Forms which are calls to the functions INCLUDE or INCLUDEF are simply evaluated when they are encountered. This causes the contents of the specified file to be included in the compilation at that point, just as when interpreting such a form would cause the contents of the file to be loaded at that point.

A random form is anything read from the input file that is not one of the special types of forms described above. It is simply copied into the output file of the compiler in such a way that when that file is loaded it will be evaluated.


Compiler Output	Concept	`Compiled Code`

The output of the compiler normally consists of error and warning messages on the terminal, and a file of machine code which can be loaded into a lisp environment with LOAD or FASLOAD. In the PDP-10 implementation it is also possible to get a “lap file.” This is a file which contains machine code in symbolic form.

In the Multics implementation the compiler produces a standard object segment with a translator name of “lisp” and a symbol section which contains the information used by LOAD to define functions, set up constants needed by the compiled code, set up “uuo links,” etc.

On Multics, when the object segment is “loaded,” it is not copied into the lisp environment. Instead a “linkage block” is set up in the environment and initialized according to directives in the segment's symbol section. This block includes the reference name of the object segment and a pointer to it. Thus compiled code is automatically shared between multiple users in the Multics implementation. However, list structure constants used by the compiled code can never be shared.

In the PDP-10 implementation the output of the compiler is a “FASL file.” This file begins with a header identifying it as a fasl file and indicating what version of lisp it was produced for. (This is used to detect incompatibilities.) The rest of the file consists of a series of directives to load and relocate code words, set up list-structure constants, reference value cells of symbols, evaluate random S-expressions, etc.

FASLOAD operates by reading through the file, storing code in lisp's binary program space, and generating the necessary LISP objects for constants used by the compiled code. Normally none of this is shared between users, but see documentation on PURE, PURIFY, etc. for information on how to make things pure and shared.


`COMPILER-STATE`	Value	`NIL`

This variable is primarily for macros so that they can tell what environment they are being called from. In a vanilla interpreter, its value is NIL. In the compiler, it may take on one of the values TOPLEVEL, MAKLAP, or COMPILE.

TOPLEVEL means that the compiler is at toplevel. In other words, it is not compiling a file. The user is probably conversing interactively with the Lisp's toplevel Read-Eval-Print loop.

MAKLAP means that the compiler is scanning an input file looking for the function definitions to compile. In this state, the compiler will expand any top-level form (fn ...) for which fn has a macro property to see if a function definition is produced (which would then be compiled).

COMPILE means the compiler is actually compiling a form which is a function definition, producing lap code. In this state, any subform (fn ...) which is to be compiled is expanded if fn has a macro form.

Note: On Multics, this variable is bound in the compiler but not in the interpreter, so a BOUNDP test may be necessary in some situations before checking for TOPLEVEL, MAKLAP, or COMPILE.


`DECLARE`	Special Form	`(DECLARE form₁ form₂ ...)`

In the interpreter, DECLARE is a no-op. In the compiler, DECLARE is recognized specially and treated differently in different places. As a toplevel form in a file being compiled, all forms in a DECLARE will be evaluated. Functions such as FIXNUM, FLONUM, SPECIAL, *EXPR, *LEXPR, *FEXPR, and ARRAY* are available as fsubrs in the compiler environment and may be placed in DECLAREs. In a function, DECLARE acts as a local declaration. Only SPECIAL, UNSPECIAL, FIXNUM and FLONUM declarations are allowed at this point. These declarations are not executed, but rather pushed by the compiler into special local declaration databases which are in effect only for the lexical scope of the current context. Random forms in local DECLAREs are an error.


`FIXNUM-IDENTITY`	Function	`(FIXNUM-IDENTITY object)`

[PDP-10 Only] An identity operator over the class of fixnums. In the interpreter, errs if the argument is not a fixnum. In the compiler, declares that the object may be expected to be a fixnum so that useful optimizations may be done.

Example:

(FIXNUM-IDENTITY (CAR '(3 4 5))) => 3
(FIXNUM-IDENTITY 3)		 => 3 ;not necessary, but ok
(FIXNUM-IDENTITY 3.0)		 => error! ;does not coerce!


`FLONUM-IDENTITY`	Function	`(FLONUM-IDENTITY flonum)`

[PDP-10 Only] Like FIXNUM-IDENTITY only for flonums.


`EVAL-WHEN`	Special Form	`(EVAL-WHEN timelist form₁ form₂ ...)`

The first form, timelist, is an un-evaluated list which contains, in any order, any of the elements: EVAL, LOAD, or COMPILE. The other forms will be executed iff (1) this is a compiler read and COMPILE appears in timelist, (2) this is a compiled file loading and LOAD appeared in timelist of the source, or (3) this is a normal evaluation (e.g., expr-style LOAD) and EVAL appeared in timelist.

;; Load Macro Support
(EVAL-WHEN (EVAL COMPILE)
           (LOAD '((MYDIR) MACROS FASL)))

;; Load things needed in compiler and at runtime
(EVAL-WHEN (EVAL LOAD COMPILE)
           (LOAD '((MYDIR) SUPPRT FASL)))

;; Do things needed in compiler only
(EVAL-WHEN (COMPILE)
           (SETQ DEFMACRO-FOR-COMPILING NIL))

;; Only when expr code is loaded...
(EVAL-WHEN (EVAL)
           (SSTATUS FEATURE DEBUG))


`HERALD`	Special Form	`(HERALD module [version])`

[PDP-10 Only] Calling HERALD will type out a message saying that module is loading. The optional version argument should be a symbol which is the version number of the file iff the file might be transported to a site which has no version number capability on files. On sites with version numbers, HERALD can figure out the relevant information from the value of INFILE as it is expanding. The version number corresponds to the source, not the compiled file.

HERALD also sets up a VERSION property which is the version. You can tell if a HERALDed file has loaded by doing (GET 'module 'VERSION).

For example, a file containing (HERALD FOO) will print out something like:

;Loading FOO 259

when it loads. Other files that depend on FOO for compilation support probably want to do:

(EVAL-WHEN (EVAL COMPILE)
  (IF (NOT (GET 'FOO 'VERSION))
      (LOAD '((SOMEDIR) FOO))))

The “;Loading ...” message is directed to MSGFILES. That typeout is runtime conditionalized upon (STATUS FEATURE NOLDMSG). If that feature is enabled, no typeout is generated by the HERALD expression.

Compiler Internals

These hooks are not for casual users. They are dirty implementation details for aiding in system building when no other option presents itself. They should be used sparingly. They are available in the compiler only; for interpreted code, you must fend for yourself. If you're not sure how code can know when it's in the compiler, see documentation on EVAL-WHEN and the variable COMPILER-STATE.


`SQUID`	Value	`unspecified`

[PDP-10 Only] In the compiler only, the value of the symbol SQUID is a magic object which is used to tag objects which are Self-QUoting Internal Data. This is a very lowlevel primitive useful only really to system implementors. It is what the readmacro sequence “#,” is built upon.

After macroexpansion, any expression in code being processed by the compiler, whether quoted or not, whose car is the symbol held by the variable SQUID will have its cdr evaluated at FASLOAD time and that result will be used in place of the SQUID form in the loaded environment.

Web-Only Note:

The following information, from my personal notes, may also be helpful:

;;; [Source: GJC (07/08/80)]
;;; If you compile with the (-k) option a file which using
;;; +internal-/"-macro ala "foo bar", then you loose. 
;;; LAP doesn't have squids I suppose? Who cares right?
;;;
;;; [Source: JONL (07/09/80)]
;;; Right, there is no provision now for LAP to hack these
;;; "user-atoms", but there could be, with a little effort.
;;; On the other hand, you could also use the more primitive
;;; definition of the " macro - just let it turn strings 
;;; into symbols.


`COUTPUT`	Function	`(COUTPUT sexp)`

There is a function defined in the compiler, COUTPUT, which can be used to put a random S-expression into the output file. When the file is loaded, this S-expression will be evaluated.

This can be used to print the version number of the program, initialize its data base, etc. It cannot be used to fool around with obarrays because of the way the loader handles atomic symbols. For efficiency, it creates a table of all the atoms needed by the file being loaded, and creates and interns them all just once. This makes loading much faster, but means that everything in a file has to go on the same obarray.

The COUTPUT function usually does not have to be used, since the compiler COUTPUTs any “random form” it reads from the input file. It is provided for the benefit of certain classes of hairy macros.