The first part of the project requires you to write the input and output routines
(and the SExp class as well). Here are a few suggestions on how the input routine
might be written:

If you didn't have to worry about the list notation on input i.e., all input would
use the dotted notation, things like (A B) would be typed in as (A . (B . NIL)),
the input routine would be simple. You could do it as follows: Let me assume that
you have a basic input routine called ckNextToken; this routine tells you whether
the next token is "(", ")", ".", or an identifier [these being the only possible
tokens; actually, there is another possible token, integer, but I will let you
take care of that one].

ckNextToken doesn't actually skip to the next token; if you call it repeatedly, it
will keep returning the same token. So you actually need another routine,
skipToken(), that will do that but I will leave that routine, and inserting calls
to that routine at the right places in the code below to you. Let us say
ckNextToken returns 1, 2, 3, or 4 corresponding respectively to the four token
types, "(", ")", ".", and identifier. [I am also leaving to you the question of
handling the $ and $$ tokens.]

Let me also assume that there is another routine, getId, which you call only if
you know the next token is an identifier and this routine returns the actual
identifier that is the next token [this routine also does not skip to the next
token].  You can then write the input routine as follows:

      input[] = [ eq[ckNextToken[],4] --> getId[];
                | eq[ckNextToken[],1] --> cons[input[], input[]];
                | T --> error!
                ]

If the next token is an identifier, we just call getId[] to read that identifier
and return it [but don't forget to insert the call to skipToken[], else you will
keep reading this same identifier!].

By the way, note that I am assuming that getId[] will return a pointer to an SExp
object, the one corresponding to the particular identifier, not the name of the
identifier (in the form of a string).  One way to do this is as follows: in the
SExp class, maintain a static variable, idPointers[], that contains pointers to
all existing identifers:

  static SExp*  idPointers[20];

Whenever a new identifier is created, you add, to the idPointers[] array, a
pointer to that SExp object. Then if you have read in a token, say, "ABC", you can
check whether an identifier with that name already exists by going through the
idPointers[] array and comparing "ABC" with the name of each of the existing
identifiers. If you find a match, just use the pointer to that particular
identifer; if there is no match, you have to construct a *new* SExp object of
identifier type, with "ABC" as its name and use that. But also remember to add, to
the idPointers[], a pointer to this new SExp object.  Integers are handled
similarly but you should not keep pointers to existing integer objects etc.;
instead, create a new SExp object corresponding to each integer that you run into.
I have more details about this at the end of this note.

Let us go back to the definition of input[] above.  If the next token is "(",
i.e., token type is 1, we [should first skip this "(" and then] call input twice
to read the car and the cdr of the s-expression being input, cons the two together
and return the result [again there are several calls to skipToken() that you will
need, including one to take care of the "." between the two calls to input[] in
the second line of the definition].

If the next token is neither an identifier nor "(", we have an error since an
s-expression can't start with anything else.

By the way, notice something interesting about the above input[] routine?  It is
written in Lisp and is (or looks) pure functional! Well, of course, 
ckNextToken[] and getNextToken[] etc. are not in
Lisp but those are really low level functions; the main part of input[] *is*
written in Lisp. Slick, isn't it? Actually, there is a subtlety here. I am
assuming in this definition that the first argument to cons[] will be evaluated
before its second argument. If this were not true, and the second argument were to
be evaluated first, then if you type in something like (3 . 4), the second
argument will be evaluated first and will evaluate to 3, then the first argument,
which will evaluate to 4, and cons will come back with the s-expression (4 . 3)! I
am not saying that the interpreter exchanged the two arguments; instead, it just
evaluated them in a different order than we expect. This would not be a problem if
we didn't have any side-effects but input[] obviously has the side- effect of
removing an s-expression from the input stream. That is the source of the problem.

Anyway, back to the input function.  What if we want to deal with lists as well?
Note the following: If the first token is an identifier, there is no difference
with what we have above. But if the first token is "(" then we have some
complications; following this we may immediately have ")" which corresponds to
NIL; if not, we will have an s-expression, let us call it S1. So our input starts
out:
   (S1 
We can call input[] to read S1 recursively but what comes after that is going to
be a bit tricky; you may have a "." in which case, we have the standard case, that
is something like (S1.S2); or you may have ")" in which case the overall
s-expression is (S1.NIL); or you may have something that looks like S2 ...)  [By
the way, in this note, I am assuming that there are no extra white spaces; in
other words, I am assuming that (3.4) will not be written as (3 .  4); you will
have to take care of this in your implementation.]

To handle these other [non-dot] cases, the easiest approach is to define another
input routine, let us call it input2[] that reads in either ")" and returns NIL,
or reads say S2) and returns (S2.NIL) or reads S2 S3) and returns (S2.(S3.NIL)), etc.

   input2[] = [ eq[ckNextToken[],2] --> NIL;
              | T --> cons[input[], input2[]]; ]

Again appropriate calls to skipNextToken() have to be inserted but the above
function is really simple: it either reads ")" and returns NIL, or reads, by
calling input[], an S-expression [S2 in the discussion above] and then calls
itself to read the rest of the list! Note the distinction between input[] and
input2[]; the former reads a complete s-expression, the latter reads something of
the form ")" or "S2)" or "S2 S3)" etc. *This* is why you should not try to combine
these two functions into one because they really do different things.

As I mentioned a while ago in class, some implementations of Lisp allow a "mixed" notation;
here, you are allowed to write, say (2 3 . 5); what this means is (2 . (3 . 5)) Do
not implement this in your project. The reason that this notation is used in
Common Lisp, I think, is the following. Consider how you would do output. 
You can always, of
course, do output in the dot notation. But suppose you want to also use the list
notation so that you don't have a lot of parentheses. Then the output routine will
have to first check (how?)  whether the given s-expression can be output in the
list notation; if not, use the dot notation; if yes, output in list notation.
What they did in Common Lisp was, instead, to simply assume that the given s-exp
can be output in list notation and proceed. So given something like (2 . (3 . 5)),
this output routine [expecting to output something like (2 3 5)], starts by
outputting "(", then the car [i.e., 2], then space, then cadr [i.e., 3], then
discovers that cddr is an atom [5] and is not NIL! So the assumption that this
could be output as a list was incorrect.  At that point, they just output a "
. 5)".  If it had been the atom NIL, they would just have output ")", and the list
would be complete.  Instead, since it is not NIL but is an atom, they output "
. 5)" and expect the user to interpret (2 3 . 5) as (2 . (3 . 5))

That is really ugly in my opinion! Don't do that in your implementation.  If you
want to output using the list notation *when* possible, first check that the given
s-expression is a list and *if* it is, proceed to output as a list; otherwise,
output using the dot notation.

One other comment: If you generally think in a "recursive descent" fashion, you
might want to see if you can write a BNF grammar for s-expressions [including
list, dot, and mixed notation]. A natural grammar for this will have two
non-terminals; the routines input[] and input2[] will correspond to these.

--Neelam.

----------------------------------------
Back to the problem of dealing with identifiers. As I said above, in the SExp
class, you could maintain a static variable, idPointers[], that contains pointers 
to all existing identifers:

  static SExp*  idPointers[20];

Whenever a new identifier is created, you add, to the idPointers[] array, a
pointer to that SExp object. Then if you have read in a token, say, "ABC", you can
check whether an identifier with that name already exists by going through the
idPointers[] array and comparing "ABC" with the name of each of the existing
identifiers. If you find a match, just use the pointer to that particular
identifer; if there is no match, you have to construct a *new* SExp object of
identifier type, with "ABC" as its name and use that. But also remember to add, to
the idPointers[], a pointer to this new SExp object.  Integers are handled
similarly but you should not keep pointers to existing integer objects etc.;
instead, create a new SExp object corresponding to each integer that you run into.

I think it might be useful to include, among the methods of the SExp class, a
static method, call it, symbolicAtom():

  static SExp* symbolicAtom(String s);

Essentially, this method should receive, as argument, a string (such as "CAR" or
"XYZ"), and return the pointer to the SExp object corresponding to the symbolic
atom whose name matches that string. If the given string is something like "CAR",
the returned object will be an existing symbolic atom; if it is something like
"XYZ" that we have *not* encountered before, it will create a new SExp object of
type symbolic atom, set its name to the given string, add it to the icPointers[],
and return a pointer to this new object. There could be another method,
integerAtom() that takes a number as argument and returns a pointer to a (newly
constructed) SExp object corresponding to that number.

Given these methods, the input[] routine doesn't have to know anything about the
internal details of the SExp class. Moreover, the initialization can be done easily. 
You can write things such as:

  SExp* resCAR = SExp.symbolicAtom("CAR");

I am not sure if the above (C++-style) syntax is correct but effectively the call
to symbolicAtom() should make an atomic object whose name is "CAR" and I am
setting the value of resCAR to be the pointer to that object. So later on, when
the input[] reads the token "CAR" from the input stream, the value it returns will
match this pointer.