BugsWorld Project Description


In the next few weeks you will implement a compiler for a simple programming language used to control the behavior of agents in a distributed simulation (BugsWorld game). Although this language may be a simplified version of a real programming language, the translation process involves all the main phases usually found in compilers for full-fledged languages such as C++ and Java. This project will give you a chance to apply the ideas and use the components that you have studied so far while learning the fundamental concepts and techniques used in language processing applications and implementing several components in this domain.

The Game

BugsWorld is a game derived from the Darwin game invented by Nick Parlante at Stanford University. It consists of a two-dimensional rectangular world divided into square cells and populated by bugs. Each bug belongs to a particular species and that determines how the bug behaves. Here is an example of the world.

Each bug resides in one of the world cells and faces one of the four compass directions (north, east, south, or west). The basic steps that a bug can take are: move, turnleft, turnright, infect, and skip. Move results in the bug moving to the next cell in the direction it is facing (unless it is facing a wall or another bug, in which case it does nothing). Turnleft and turnright, as you might imagine, result in the bug turning by 90° left (counter-clockwise) or 90° right (clockwise) respectively. If the bug is facing a cell containing a bug of a different species, infect results in the other bug being converted to the species of the infecting bug (otherwise it does nothing). Finally, skip allows a bug to do nothing for one time step (one turn).

Essentially the BugsWorld game is a simulation of this world and of the critters in it. You can think of it as a competition among species where each species represented in the world (by one or more bugs) tries to survive by avoiding and/or by infecting the bugs of the other species.

The Language

The behavior of (bugs of) a certain species is expressed by a program written in a simple programming language called BL (Bugs Language). In addition to the five primitive instructions (move, turnleft, turnright, infect, and skip), the language provides a couple of selection control structures (IF-THEN and IF-THEN-ELSE), an iterative control structure (WHILE-DO), and the ability to define new instructions (INSTRUCTION-IS). Here is a sample BL program (try to figure out the behavior that creatures of this species would display).

    PROGRAM TryToGuess IS

        INSTRUCTION FindObstacle IS
            WHILE next-is-empty DO
                move
            END WHILE
        END FindObstacle

    BEGIN
        WHILE true DO
            FindObstacle
            IF next-is-enemy THEN
                infect
            ELSE
                IF next-is-wall THEN
                    turnleft
                ELSE
                    skip
                END IF
            END IF
        END WHILE
    END TryToGuess

Since you are already familiar with the syntax of a much more complicated programming language, we won't spend much time describing the details of the BL language. Here are some of the salient features of the language:

The Simulator

The simulator for the Bugs world is made up of three distinct applications: server, client, and display (see figure below).

For each simulation there is exactly one server, but there can be multiple clients (usually at least two) and displays (whenever users want to view the simulation from different terminals). The server is responsible for keeping track of the state of the world, for making sure that each bug gets a fair chance to make progress, and for resolving conflicts when two or more bugs make conflicting requests (e.g., two bugs that try to move into the same cell). Each client handles all the bugs of one particular species (so there will be one client for each species involved in the simulation). Every time the server tells a client that a certain bug should get a turn in the simulation, the client executes the corresponding species' program to determine what that bug's next step should be, and then sends a request to the server to perform the chosen step on behalf of the bug. Since neither server nor clients show what's happening in the world, the display application can be used to view the world during the simulation.

Note that the server, each of the clients, and each of the displays can be running on different machines. More details on running these applications are provided in a separate handout.

The Compiler

Since each client handles exactly one species, the client application is responsible for executing the BL program that determines the behavior of the corresponding species. However, the client does not execute the species program in its source form. We first translate the source program into a more convenient form, and then have the client execute the program in this "compiled" form. This has several advantages:

So how does the compiler work and what is this "more convenient" form into which we want to convert the source code? Here is the compiled version of the sample program TryToGuess: 20 15 20 6 7 0 5 2 12 12 3 5 18 8 17 1 5 18 4 5 0. It is simply a sequence of integers that encodes the original program in a form that makes execution very simple and fast (though it makes it very difficult to read!).

The translation process is not a one step process. There are actually three distinct transformations that we need to perform (see picture below).

The source code can be viewed as a string of characters. The first phase (tokenizing) breaks up this string of characters into "chunks" called tokens. The resulting string of tokens is fed to the parser in the second phase (parsing). This produces an abstract representation of the original program. Finally, in the third phase (code generation), the abstract program is traversed and the string of integers that represents the object code is generated. The following table shows the results of each transformation. White space tokens were omitted from the list of tokens since they are not needed by the following phases. The abstract program probably will not make much sense at this time. However, it is important to note that it still contains enough information about the original program for the code generator to produce the object code and for the program to be executed.

    PROGRAM TryToGuess IS

        INSTRUCTION FindObstacle IS
            WHILE next-is-empty DO
                move
            END WHILE
        END FindObstacle

    BEGIN
        WHILE true DO
            FindObstacle
            IF next-is-enemy THEN
                infect
            ELSE
                IF next-is-wall THEN
                    turnleft
                ELSE
                    skip
                END IF
            END IF
        END WHILE
    END TryToGuess
PROGRAM, TryToGuess, IS, INSTRUCTION, FindObstacle, IS, WHILE, next-is-empty, DO, move, END, WHILE, END, FindObstacle, BEGIN, WHILE, true, DO, FindObstacle, IF, next-is-enemy, THEN, infect, ELSE, IF, next-is-wall, THEN, turnleft, ELSE, skip, END, IF, END, IF, END, WHILE, END, TryToGuess
1. Source Code
2. Tokens
TryToGuess
FindObstacle
WHILE
next-is-empty
move
WHILE
true
FindObstacle
IF-ELSE
next-is-enemy
infect
IF-ELSE
next-is-wall
turnleft
skip
21 16 20 7 7 0 6 2 13 12 3 6 18 9 17 1 6 18 4 6 0 5
3. Abstract Program
4. Object Code