Overview
Rifl (Rahul's Interpreter for Learning) is a tree-walking interpreter for a custom programming language, implemented entirely in Go. The project demonstrates the complete pipeline of language implementation: lexical analysis, recursive descent parsing, abstract syntax tree construction, and tree-walking interpretation with lexical scoping.
Motivation and Design Philosophy
The language specification emphasizes several key design goals that influenced architectural decisions throughout the implementation:
- Type safety: Strong typing with support for primitive types (integers, floats, booleans, strings) and planned static type checking
- Expression-oriented syntax: Following the tradition of functional languages where nearly everything returns a value
- Unicode support: Full UTF-8 support for source code, enabling international identifiers and future language experimentation
- Clear error reporting: Line-accurate error messages with helpful diagnostic hints
Architecture: The Classical Pipeline
The interpreter follows the traditional three-stage architecture found in many language implementations:
Source Code → Scanner → Tokens → Parser → AST → Interpreter → Execution
Stage 1: Lexical Analysis (Scanner)
The scanner (modules/scanner/root.go) performs character-by-character tokenization with careful attention to Unicode handling. A key design decision was using Go's rune type throughout rather than byte-level processing, enabling proper multi-byte character support:
func (s *Scanner) ScanTokens() []types.Token {
for {
char := rune(s.source[s.start])
switch char {
case '(':
s.addToken(types.LEFT_PAREN)
// ... single-character tokens
case '! ':
if rune(s.source[s. current+1]) == '=' {
s.advance()
s.addTokenWithLiteral(types.BANG_EQUAL, s.setLiteral())
} else {
s.addToken(types.BANG)
}
// ... multi-character tokens
}
}
}
Technical highlight: The scanner implements lookahead for two-character operators (!=, ==, <=, >=) by checking s.current+1, distinguishing between single and compound operators in a single pass.
Number literal handling: Floating-point detection required careful state management to prevent malformed literals like 5.5. 5. The scanDigitSequence() method uses a boolean flag to track whether a decimal point has been encountered:
func (s *Scanner) scanDigitSequence() (isFloat bool, isInteger bool) {
foundFractionalComponent := false
for {
if unicode.IsDigit(rune(s.source[s.current+1])) {
s.advance()
} else if rune(s.source[s. current+1]) == '.' {
if ! foundFractionalComponent {
if unicode.IsDigit(rune(s.source[s.current+2])) {
foundFractionalComponent = true
s.advance()
} else {
errors.Exit(errors. SourceError(s.line,
"Syntax error: floating point must contain digit after '.'"))
}
} else {
errors.Exit(errors.SourceError(s.line,
"Syntax error: only one decimal point allowed"))
}
}
// ...
}
return foundFractionalComponent, ! foundFractionalComponent
}
This approach provides clear error messages for common mistakes while maintaining single-pass efficiency.
Stage 2: Parsing with Recursive Descent
The parser (modules/parser/root.go) implements a recursive descent algorithm following operator precedence grammar. Each precedence level has its own parsing method, creating a natural correspondence between grammar rules and code structure:
Expression → Assignment
Assignment → Equality
Equality → Comparison
Comparison → Term
Term → Factor
Factor → Unary
Unary → Primary
Precedence implementation: Lower-precedence operations call higher-precedence parsers, building the correct tree structure:
func (s *Parser) Equality() types.Expression {
var expr types.Expression = s. Comparison()
for s.Match(types.BANG_EQUAL, types.EQUAL_EQUAL) {
var operator types.Token = s.Previous()
var right types.Expression = s. Comparison()
expr = types. BinaryExpression{
Expr: types.CommonExpression{},
Left: expr,
Operator: operator,
Right: right,
}
}
return expr
}
This elegantly handles left-associativity: a == b == c becomes (a == b) == c because the loop consumes operators left-to-right, wrapping the previous result as the left operand.
Control flow parsing: The parser supports modern control structures with proper scoping. The for loop implementation demonstrates desugaring:
func (s *Parser) ForStatement() types.Statement {
s. Consume(types.LEFT_PAREN, "Syntax Error: Expected '(' after 'for'")
var initializer types.Statement
if s.Match(types.VAR) {
initializer = s. VariableDeclaration()
} else {
initializer = s.ExpressionStatement()
}
var condition = s.Expression()
s.Consume(types.SEMICOLON, "Syntax Error: Expected ';' after condition")
var increment = s.Expression()
s.Consume(types.RIGHT_PAREN, "Syntax Error: Expected ')' after for")
var body = s.Statement()
return types.ForStatement{
Initial: initializer,
Condition: condition,
Increment: increment,
Body: body,
}
}
The parser preserves the C-style for loop structure, leaving the interpreter to handle its execution semantics.
Stage 3: Tree-Walking Interpretation
The interpreter (modules/interpreter/) uses the Visitor pattern implicitly through Go's type switches. Rather than implementing explicit visitor interfaces, the code leverages Go's structural typing:
func Evaluate(expression types.Expression) types.Value {
switch v := expression.(type) {
case types.LiteralExpression:
return evaluateLiteralExpression(v)
case types.BinaryExpression:
return evaluateBinaryExpression(v)
case types.VariableExpression:
return evaluateVariableExpression(v, global)
case types.AssignExpression:
return evaluateAssignExpression(v, global)
// ...
}
}
Binary expression evaluation: Type coercion and operator overloading are handled explicitly:
func evaluateBinaryExpression(expr types.BinaryExpression) types.Value {
left := Evaluate(expr.Left)
right := Evaluate(expr. Right)
switch expr.Operator. TokenType {
case types. PLUS:
return types.Value{
Type: types.NUMBER_TYPE,
Value: getNumber[float64](left. Value) + getNumber[float64](right.Value),
}
case types.GREATER:
return types.Value{
Type: types. BOOLEAN_TYPE,
Value: getNumber[float64](left.Value) > getNumber[float64](right.Value),
}
// ...
}
}
Note the deliberate design decision: the PLUS operator only supports numeric addition, not string concatenation. String concatenation requires an explicit function call, promoting type clarity.
Lexical Scoping with Environments
Variable scoping uses a parent-pointer tree structure (modules/types/environment.go), a classic approach for implementing lexical scoping:
type Environment struct {
referenceScope *Environment
variables map[string]Value
}
func (s *Environment) Get(name string) Value {
if value, ok := s.variables[name]; ok {
return value
}
if s.referenceScope != nil {
return s.referenceScope.Get(name)
}
return Value{} // Undefined variable
}
Scope management: Block statements create new environments that reference their enclosing scope:
func interpretBlockStatement(stmt types.BlockStatement, env types.Environment) {
previousEnvironment := env
environment := types.NewEnvironment()
SetGlobalEnvironment(environment)
Interpret(stmt.Statements)
SetGlobalEnvironment(previousEnvironment)
}
This enables proper variable shadowing: inner scopes can declare variables with the same name as outer scopes without conflict, and variable lookup traverses the chain from innermost to outermost scope.
CLI Interface and REPL
The command-line interface (built with Cobra) supports both file execution and a read-eval-print loop:
Run: func(cmd *cobra.Command, args []string) {
if len(args) > 1 {
fmt.Println("Usage: interpreter or interpreter [file path]")
os.Exit(64)
} else if len(args) == 1 {
runner. RunFile(args[0])
} else {
runner.RunPrompt()
}
}
The REPL enables rapid prototyping and testing of language features, essential for iterative language design.
Design Challenges and Solutions
Challenge 1: UTF-8 string handling
Go's strings are UTF-8 encoded byte sequences, not rune arrays. Using utf8.RuneCountInString() instead of len() for bounds checking prevented off-by-one errors when scanning multi-byte characters.
Challenge 2: Error recovery in parser
The current implementation uses errors.Exit() for syntax errors, halting execution immediately. This provides clear error messages but doesn't support error recovery for multiple errors per file.
Challenge 3: Type system representation
Values carry runtime type information through a ValueType enum:
type Value struct {
Type ValueType
Value interface{}
}
This enables runtime type checking while maintaining Go's static typing for the interpreter itself. The tradeoff: type assertions are required when extracting typed values.
Current Capabilities
The interpreter successfully executes:
- Arithmetic expressions with proper precedence
- Boolean logic and comparison operators
- Variable declaration and assignment with block scoping
- Control flow:
if/else,while,forloops - Nested block statements with proper shadowing
Future Directions
The codebase includes commented placeholders for planned features:
- Function declarations and calls: Infrastructure partially implemented
- Return statements: Parser support exists, interpreter pending
- Static type checking: Currently dynamic; specification calls for optional static typing
- Module system: Directory structure suggests future modular programming support
Technical Takeaways
This project demonstrates several key principles of interpreter design:
- Separation of concerns: Each compilation stage is isolated in its own module with clear interfaces
- Grammar-driven structure: Parser code directly mirrors the language grammar, making it maintainable and extensible
- Pragmatic error handling: Detailed error messages with line numbers and contextual hints improve developer experience
- Unicode-first design: Supporting international character sets from the beginning avoids costly retrofitting
The complete source with examples is available at github.com/RahulHi/rifl.
Built with Go 1.x, leveraging the Cobra CLI framework for command-line interface and Go's native Unicode support for internationalization.