Tokens
Token Types
Each time it is called, the Lexical Analyzer returns
a type-value pair representing the next token found in the source.
The token types in our language are listed below. For each token that is
isolated, the lexical analyzer will determine its type and set the type
specification to the appropriate value.
- PROGRAM
- BEGIN
- END
- VAR
- FUNCTION
- PROCEDURE
- RESULT
- INTEGER
- REAL
- ARRAY
- OF
- IF
- THEN
- ELSE
- WHILE
- DO
- NOT
- IDENTIFIER
- INTCONSTANT or REALCONSTANT
- RELOP
- MULOP
- ADDOP
- ASSIGNOP
- COMMA
- SEMICOLON
- COLON
- RIGHTPAREN
- LEFTPAREN
- RIGHTBRACKET
- LEFTBRACKET
- UNARYMINUS
- UNARYPLUS
- DOUBLEDOT
- ENDMARKER (single dot)
- ENDOFFILE
Value
The value portion of the type-value pair
will vary depending on the type of token isolated.
-
For keywords, UNARYMINUS, UNARYPLUS, and punctuation,
value
will be empty or contain the lexeme representing the token.
-
For identifiers and constants, value is a
pointer to the location in the symbol table at which the string representing
the identifier or constant has been inserted. Until your symbol table routines
are written, you can simply store the lexeme itself in the value field
for identifier and constant tokens.
-
For RELOP, ADDOP, and MULOP, value indicates
the specific operator type, as follows:
RELOP
- =
- <>
- <
- >
- < =
- > =
ADDOP
- +
- -
- OR
MULOP
- *
- /
- DIV
- MOD
- AND