Code generation
Code generation
A high level language is first LEXICALLY analysed. Then its SYNTAX is analysed. During syntax analysis, the SEMANTICS of the code is checked to see if it makes sense. If any errors are found in these stages the REPORT GENERATOR program springs into action and displays helpful (or not so helpful) error messages. The programmer uses these to aid debugging the program. This is known as ‘translator diagnostics’. When the program can be lexically and syntactically analysed without errors, the object code can be generated.
Consider the line: LENGTH := 2 * (FIRST - SECOND) + 4 * (THIRD - FOURTH)
During lexical analysis, a look-up table will have been created of all the variables found in the source code, their data type and where in memory the variable value can be found. For example;
|
Name |
Type |
Address |
|
LENGTH |
INT |
FFB2 |
|
FIRST |
INT |
1445 |
|
SECOND |
INT |
A4A3 |
|
THIRD |
INT |
FF21 |
|
FOURTH |
INT |
C411 |
Also, library routines may need to be called up and referenced. In Pascal, for example, you would call the library known as CRT to enable you to use the ‘Clear the screen’ routine.
What happens during code generation?
The computer needs to generate machine code from the high-level language. The way the line
LENGTH := 2 * (FIRST - SECOND) + 4 * (THIRD - FOURTH)
is written follows maths rules that humans can follow. Computers don't work like that! The line needs to be rearranged so that the computer can figure out what to add, subtract and multiply and in what order. The most complicated instruction in programming can be broken down into a series of 'baby steps'. One way to represent how the computer will work on each instruction is to create a tree structure to represent each phrase of code. For example, using the example instruction above, the instruction could be represented as a diagram:

-
- The computer will start at level 1. Any variables used can be substituted with the variable address kept in the look-up table and any library routines can be called by using the routine's address. How routines and procedures all work together is the job of the ‘linker’. Where the program is put into memory is the job of the ‘loader’.
- At level two, you can see that 'First' minus 'Second' will generate some code. Also, 'Third' minus 'Fourth' will generate code. These will be passed up to the next level.
- The code generated by 'First' minus 'Second' will be multiplied by 2. The code generated by 'Third' minus 'Fourth' will be multiplied by 4. The code will then be passed up to the next level.
- The two lots of code generated so far will be added together and passed up to the final level, level 5. The code generated so far will be assigned to 'Length'.
The original instruction was broken down into a series of 'baby steps' that were then carried out by the computer. The code for each step was recombined with the code generated by other steps, until the code for the whole instruction had been written.
Code optimisation
The generation of code proceeds along the lines described above for each instruction until the whole program has been turned into object code. It can then be run on the computer, or distributed to users to use on their computers. Code generated using this method may not be the most efficient code that could be generated. For example, when a tree is produced, you may get:
VALUE:=RESULT*1 or you may get another example: x:=y+0
Clearly, VALUE:=RESULT*1 is the same as VALUE:=RESULT, and x:=y+0 is the same as x:=y
These examples, then, show that the code generation program could produce inefficient code! The code must be optimised! When the compiler has generated the object code, it runs routines that do just this. It tries to make the code as fast and as efficient as possible by finding and removing unnecessary code.