verticallines

Loading...

Rifl: A Tree-Walking Interpreter for a Custom Language

Rahul Hathwar - 2025-11-29 - Building a type-safe interpreter with Go from first principles

Overview

Rifl (Rahul's Interpreter for Learning) is a tree-walking interpreter for a custom programming language, implemented entirely in Go. The project demonstrates the complete pipeline of language implementation: lexical analysis, recursive descent parsing, abstract syntax tree construction, and tree-walking interpretation with lexical scoping.

Motivation and Design Philosophy

The language specification emphasizes several key design goals that influenced architectural decisions throughout the implementation:

  • Type safety: Strong typing with support for primitive types (integers, floats, booleans, strings) and planned static type checking
  • Expression-oriented syntax: Following the tradition of functional languages where nearly everything returns a value
  • Unicode support: Full UTF-8 support for source code, enabling international identifiers and future language experimentation
  • Clear error reporting: Line-accurate error messages with helpful diagnostic hints

Architecture: The Classical Pipeline

The interpreter follows the traditional three-stage architecture found in many language implementations:

Source Code → Scanner → Tokens → Parser → AST → Interpreter → Execution

Stage 1: Lexical Analysis (Scanner)

The scanner (modules/scanner/root.go) performs character-by-character tokenization with careful attention to Unicode handling. A key design decision was using Go's rune type throughout rather than byte-level processing, enabling proper multi-byte character support:

func (s *Scanner) ScanTokens() []types.Token {
    for {
        char := rune(s.source[s.start])
        switch char {
        case '(':
            s.addToken(types.LEFT_PAREN)
        // ... single-character tokens
        case '! ':
            if rune(s.source[s. current+1]) == '=' {
                s.advance()
                s.addTokenWithLiteral(types.BANG_EQUAL, s.setLiteral())
            } else {
                s.addToken(types.BANG)
            }
        // ... multi-character tokens
        }
    }
}

Technical highlight: The scanner implements lookahead for two-character operators (!=, ==, <=, >=) by checking s.current+1, distinguishing between single and compound operators in a single pass.

Number literal handling: Floating-point detection required careful state management to prevent malformed literals like 5.5. 5. The scanDigitSequence() method uses a boolean flag to track whether a decimal point has been encountered:

func (s *Scanner) scanDigitSequence() (isFloat bool, isInteger bool) {
    foundFractionalComponent := false
    for {
        if unicode.IsDigit(rune(s.source[s.current+1])) {
            s.advance()
        } else if rune(s.source[s. current+1]) == '.' {
            if ! foundFractionalComponent {
                if unicode.IsDigit(rune(s.source[s.current+2])) {
                    foundFractionalComponent = true
                    s.advance()
                } else {
                    errors.Exit(errors. SourceError(s.line, 
                        "Syntax error: floating point must contain digit after '.'"))
                }
            } else {
                errors.Exit(errors.SourceError(s.line, 
                    "Syntax error: only one decimal point allowed"))
            }
        }
        // ... 
    }
    return foundFractionalComponent, ! foundFractionalComponent
}

This approach provides clear error messages for common mistakes while maintaining single-pass efficiency.

Stage 2: Parsing with Recursive Descent

The parser (modules/parser/root.go) implements a recursive descent algorithm following operator precedence grammar. Each precedence level has its own parsing method, creating a natural correspondence between grammar rules and code structure:

Expression → Assignment
Assignment → Equality
Equality → Comparison
Comparison → Term
Term → Factor
Factor → Unary
Unary → Primary

Precedence implementation: Lower-precedence operations call higher-precedence parsers, building the correct tree structure:

func (s *Parser) Equality() types.Expression {
    var expr types.Expression = s. Comparison()
    
    for s.Match(types.BANG_EQUAL, types.EQUAL_EQUAL) {
        var operator types.Token = s.Previous()
        var right types.Expression = s. Comparison()
        expr = types. BinaryExpression{
            Expr:     types.CommonExpression{},
            Left:     expr,
            Operator: operator,
            Right:    right,
        }
    }
    return expr
}

This elegantly handles left-associativity: a == b == c becomes (a == b) == c because the loop consumes operators left-to-right, wrapping the previous result as the left operand.

Control flow parsing: The parser supports modern control structures with proper scoping. The for loop implementation demonstrates desugaring:

func (s *Parser) ForStatement() types.Statement {
    s. Consume(types.LEFT_PAREN, "Syntax Error: Expected '(' after 'for'")
    
    var initializer types.Statement
    if s.Match(types.VAR) {
        initializer = s. VariableDeclaration()
    } else {
        initializer = s.ExpressionStatement()
    }
    var condition = s.Expression()
    s.Consume(types.SEMICOLON, "Syntax Error: Expected ';' after condition")
    var increment = s.Expression()
    s.Consume(types.RIGHT_PAREN, "Syntax Error: Expected ')' after for")
    
    var body = s.Statement()
    return types.ForStatement{
        Initial:   initializer,
        Condition: condition,
        Increment: increment,
        Body:      body,
    }
}

The parser preserves the C-style for loop structure, leaving the interpreter to handle its execution semantics.

Stage 3: Tree-Walking Interpretation

The interpreter (modules/interpreter/) uses the Visitor pattern implicitly through Go's type switches. Rather than implementing explicit visitor interfaces, the code leverages Go's structural typing:

func Evaluate(expression types.Expression) types.Value {
    switch v := expression.(type) {
    case types.LiteralExpression:
        return evaluateLiteralExpression(v)
    case types.BinaryExpression:
        return evaluateBinaryExpression(v)
    case types.VariableExpression:
        return evaluateVariableExpression(v, global)
    case types.AssignExpression:
        return evaluateAssignExpression(v, global)
    // ... 
    }
}

Binary expression evaluation: Type coercion and operator overloading are handled explicitly:

func evaluateBinaryExpression(expr types.BinaryExpression) types.Value {
    left := Evaluate(expr.Left)
    right := Evaluate(expr. Right)
    
    switch expr.Operator. TokenType {
    case types. PLUS:
        return types.Value{
            Type:  types.NUMBER_TYPE,
            Value: getNumber[float64](left. Value) + getNumber[float64](right.Value),
        }
    case types.GREATER:
        return types.Value{
            Type:  types. BOOLEAN_TYPE,
            Value: getNumber[float64](left.Value) > getNumber[float64](right.Value),
        }
    // ... 
    }
}

Note the deliberate design decision: the PLUS operator only supports numeric addition, not string concatenation. String concatenation requires an explicit function call, promoting type clarity.

Lexical Scoping with Environments

Variable scoping uses a parent-pointer tree structure (modules/types/environment.go), a classic approach for implementing lexical scoping:

type Environment struct {
    referenceScope *Environment
    variables      map[string]Value
}

func (s *Environment) Get(name string) Value {
    if value, ok := s.variables[name]; ok {
        return value
    }
    
    if s.referenceScope != nil {
        return s.referenceScope.Get(name)
    }
    
    return Value{}  // Undefined variable
}

Scope management: Block statements create new environments that reference their enclosing scope:

func interpretBlockStatement(stmt types.BlockStatement, env types.Environment) {
    previousEnvironment := env
    environment := types.NewEnvironment()
    
    SetGlobalEnvironment(environment)
    Interpret(stmt.Statements)
    SetGlobalEnvironment(previousEnvironment)
}

This enables proper variable shadowing: inner scopes can declare variables with the same name as outer scopes without conflict, and variable lookup traverses the chain from innermost to outermost scope.

CLI Interface and REPL

The command-line interface (built with Cobra) supports both file execution and a read-eval-print loop:

Run: func(cmd *cobra.Command, args []string) {
    if len(args) > 1 {
        fmt.Println("Usage: interpreter or interpreter [file path]")
        os.Exit(64)
    } else if len(args) == 1 {
        runner. RunFile(args[0])
    } else {
        runner.RunPrompt()
    }
}

The REPL enables rapid prototyping and testing of language features, essential for iterative language design.

Design Challenges and Solutions

Challenge 1: UTF-8 string handling
Go's strings are UTF-8 encoded byte sequences, not rune arrays. Using utf8.RuneCountInString() instead of len() for bounds checking prevented off-by-one errors when scanning multi-byte characters.

Challenge 2: Error recovery in parser
The current implementation uses errors.Exit() for syntax errors, halting execution immediately. This provides clear error messages but doesn't support error recovery for multiple errors per file.

Challenge 3: Type system representation
Values carry runtime type information through a ValueType enum:

type Value struct {
    Type  ValueType
    Value interface{}
}

This enables runtime type checking while maintaining Go's static typing for the interpreter itself. The tradeoff: type assertions are required when extracting typed values.

Current Capabilities

The interpreter successfully executes:

  • Arithmetic expressions with proper precedence
  • Boolean logic and comparison operators
  • Variable declaration and assignment with block scoping
  • Control flow: if/else, while, for loops
  • Nested block statements with proper shadowing

Future Directions

The codebase includes commented placeholders for planned features:

  • Function declarations and calls: Infrastructure partially implemented
  • Return statements: Parser support exists, interpreter pending
  • Static type checking: Currently dynamic; specification calls for optional static typing
  • Module system: Directory structure suggests future modular programming support

Technical Takeaways

This project demonstrates several key principles of interpreter design:

  1. Separation of concerns: Each compilation stage is isolated in its own module with clear interfaces
  2. Grammar-driven structure: Parser code directly mirrors the language grammar, making it maintainable and extensible
  3. Pragmatic error handling: Detailed error messages with line numbers and contextual hints improve developer experience
  4. Unicode-first design: Supporting international character sets from the beginning avoids costly retrofitting

The complete source with examples is available at github.com/RahulHi/rifl.


Built with Go 1.x, leveraging the Cobra CLI framework for command-line interface and Go's native Unicode support for internationalization.

Copyright © Rahul Hathwar. All Rights Reserved.