Compilation stages.

Source code is a human-readable text written in a specific programming language.  The compiler translates source code into machine code (low level)The code that is compiled is stored as an executable file also called an object file. When the file runs the machine code is processed by the CPU.

Lexical analysis is the process of analyzing a stream of individual characters (normally arranged as lines), into a sequence of lexical tokens (tokenization of words and symbols) to feed into the parser that the compiler will understand. 

  • Done by the scanner (lexer);
  • Identifies lexemes within the source code;
  • White space and comments are discarded;
  • Generates a stream of tokens;
  • Each token passed to the parser on request;
  • Creates a symbol table of 'names' used in the source code;
  • May also create a strings table;
  • Very limited error checking at this stage.

Syntactic analysis is alternatively known as parsing. This stage analyses the syntax of the statements to ensure they conform to the rules of grammar for the computer language in question. The purpose of syntax analysis or parsing is to check that we have a valid sequence of tokens. Tokens are a valid sequence of symbols, keywords, identifiers, etc. Syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers. Lexical analyzers are responsible for the validity of a token supplied by the syntax analyzer.

Drawbacks of syntax analyzers:

  • it cannot determine if a token is valid,
  • it cannot determine if a token is declared before it is being used,
  • it cannot determine if a token is initialized before it is being used,
  • it cannot determine if an operation performed on a token type is valid or not.

Code generationThe code generated by the compiler is an object code of some lower-level programming language, for example, assembly language.  

Minimum properties of low-level object code:​

  • It should carry the exact meaning of the source code.​
  • It should be efficient in terms of CPU usage and memory management.​

Code optimizationMaking the compile time as short as possible. Optimization is a program transformation technique, which tries to improve the code by making it consume fewer resources (i.e. CPU, Memory) and deliver high speed. The system programmer is responsible for optimization.

Optimizations provided by a compiler includes:

  • Inlining small functions 
  • Code hoisting moves loop-invariant code out of loops.
  • Reduction in strength means code optimization obtained by the user of cheaper machine instructions.
  • Dead store/code elimination removes values that never get used.
  • Eliminating common subexpressions 
  • Loop unrolling avoids tests at every iteration of the loop.
  • Loop optimizations: Code motion, Induction variable elimination, and Reduction in strength.


Последнее изменение: Wednesday, 24 April 2024, 14:22