Building a minimal Scheme Interpreter in Rust
One of my main technical interests is programming languages and compilers. I have read a decent amount of material on the subject. Recently, I decided to put my knowledge into practice by building a simple language interpreter.
By the end of this guide we’ll have a minimal, working interpreter for a small subset of the Scheme programming language.
Note: This guide assumes that you’re relatively comfortable with Rust.
All source code for this guide is available on GitHub.
What is Scheme?
Scheme is a dialect of the Lisp programming language. In Scheme (just like in other Lisp dialects), programs consist purely of expressions. These expressions are technically known as S-Expressions.
An S-Expression (Symbolic Expression) is basically a list wrapped in parentheses with spaces between elements:
> (+ 1 2 3)
This is why everything in Lisp is usually considered to be a list. In fact, the name Lisp is an acronym for LISt Processing.
Other Lisp dialects are Common Lisp, Racket, and Clojure.
Syntax and Semantics of A Scheme Program
Scheme syntax is much simpler compared to most other programming languages:
- As we’ve already mentioned, Scheme programs consist solely of expressions. There is no distinction between a statement and an expression.
- Numbers (e.g. 10) and symbols (e.g. X) are called atomic expressions; they cannot be broken into smaller pieces. In Scheme, operators such as +, /, and > are symbols too.
- Everything else is a list expression, i.e. an “(“, followed by zero or more expressions, followed by a “)”. The first element of the list determines its syntactic meaning:
- A list starting with a keyword, e.g. (if …), is a special form; the meaning depends on the keyword.
- A list starting with a non-keyword, e.g. (fn …), is a function call.
You can read more about the syntax rules of Scheme programs from Peter Norvig’s excellent article.
The Structure of A Language Interpreter
At a high level, a language interpreter has the following parts:
- The lexer (sometimes called a scanner), takes in the program source code and breaks it up into a series of tokens.
- The parser takes in tokens from the lexer and creates an abstract syntax tree (AST).
- The evaluator evaluates the AST and produces the final result of the program.
With the preliminaries out of the way, let’s get started.
This guide is divided into seven steps:
- Step 1 - Initialize a new project
- Step 2 - Lexical Analysis
- Step 3 - Parsing
- Step 4 - Evaluation
- Step 5 - Create a REPL
- Step 6 - Run the program
- Step 7 - Add tests
Step 1: Initialize a new project
Create a new Rust project by running the following in your shell:
$ cargo new rustyscm
$ cd rustyscm
The rustyscm
directory will have the following file structure:
.
├── Cargo.lock
├── Cargo.toml
└── src
└── main.rs
Step 2: Lexical Analysis
Lexical analysis breaks up a program’s source code into a sequence of tokens. In the small subset of Scheme that we are building, we have the following tokens:
- Parentheses: “(“ and “)”
- Numbers: 64-bit floating-point numbers
- Symbols: Any group of characters other than a number or parenthesis
In Rust, we’ll represent the tokens with an enum:
File: src/lexer.rs
#[derive(Debug, Clone, PartialEq)]
pub enum Token {
OpenParen,
CloseParen,
Number(f64),
Symbol(String),
}
We also implement the Display
trait for Token
:
File: src/lexer.rs
impl std::fmt::Display for Token {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
Token::OpenParen => write!(f, "("),
Token::CloseParen => write!(f, ")"),
Token::Number(n) => write!(f, "{}", n),
Token::Symbol(s) => write!(f, "{}", s),
}
}
}
The main process of tokenization is done by the tokenize function:
File: src/lexer.rs
pub fn tokenize(expr: &str) -> Result<Vec<Token>, String> {
let expr = expr.replace("(", " ( ").replace(")", " ) "); // add spaces around each paren
let words = expr.split_whitespace(); // delimit characters by whitespace
let mut tokens: Vec<Token> = Vec::new();
// convert words into tokens using pattern matching and add to the vector
for word in words {
match word {
"(" => tokens.push(Token::OpenParen),
")" => tokens.push(Token::CloseParen),
_ => {
let i = word.parse::<f64>();
if i.is_ok() {
tokens.push(Token::Number(i.unwrap()));
} else {
tokens.push(Token::Symbol(word.to_string()));
}
}
}
}
Ok(tokens)
}
The tokenize funtion takes as input an expression (a string of characters) and returns a vector of tokens.
To see how the lexer works, add the following code to the main
file:
File: src/main.rs
mod lexer;
use std::io;
use crate::lexer::tokenize;
fn main() {
let mut input = String::new();
io::stdin()
.read_line(&mut input)
.expect("Failed to read line");
let tokens = tokenize(input.as_str()).unwrap();
println!("Tokens: {:?}", tokens);
}
Now if you run the program using:
$ cargo run
You’ll get a blinking cursor waiting for you to type a Scheme expression. After typing an expression, you’ll get an output similar to the one shown below:
> (+ 1 2 3)
> Tokens: [OpenParen, Symbol("+"), Number(1.0), Number(2.0), Number(3.0), CloseParen]
As you can see the input string is split into a vector of tokens.
Step 3: Parsing
Parsing (or syntactic analysis) converts the tokens from the lexer into an abstract syntax tree. In our particular case, the parser will convert the tokens into a list of Scheme expressions that can be interpreted by the evaluator.
Here’s the enum representing valid Scheme expressions:
File: src/parser.rs
#[derive(Debug, Clone, PartialEq)]
pub enum Expression {
Bool(bool),
Number(f64),
Symbol(String),
List(Vec<Expression>),
Func(fn(&[Expression]) -> Expression),
Function(Procedure),
}
We also need to implement the Display
trait for Expression
:
File: src/parser.rs
impl std::fmt::Display for Expression {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Expression::Bool(a) => write!(f, "{}", a),
Expression::Number(n) => write!(f, "{}", n),
Expression::Symbol(s) => write!(f, "{}", s),
Expression::List(list) => {
let formatted_list: Vec<String> =
list.iter().map(|exp| format!("{}", exp)).collect();
write!(f, "({})", formatted_list.join(" "))
}
Expression::Func(_) => write!(f, "<function>"),
Expression::Function(_) => write!(f, "<function>"),
}
}
}
And here’s the Procedure
struct. This represents a Scheme function/procedure:
File: src/parser.rs
#[derive(Debug, Clone, PartialEq)]
pub struct Procedure {
pub params: Vec<Expression>,
pub body: Vec<Expression>,
pub env: Environment,
}
From the struct definition above, we see that a procedure has parameters, its body and its execution environment.
What is an Environment?
In Scheme, an environment is a mapping from variable names to their values. In Rust, we can represent an environment using a HashMap:
File: src/env.rs
use std::collections::HashMap;
use crate::parser::Expression;
#[derive(Debug, Clone, PartialEq)]
pub struct Environment {
contents: HashMap<String, Expression>,
}
As you can see, the Environment
type is just a wrapper around a Rust HashMap.
We also need to define a few functionalities for the Environment:
File: src/env.rs
impl Environment {
fn new() -> Self {
Self {
contents: HashMap::new(),
}
}
pub fn insert(&mut self, k: String, v: Expression) {
self.contents.insert(k, v);
}
pub fn get(&self, k: &str) -> Option<&Expression> {
self.contents.get(k)
}
}
Back to parsing, this is how the parse
function looks like:
File: src/parser.rs
pub fn parse(input: &str) -> Result<Expression, String> {
let token_result = match tokenize(input) {
Ok(val) => val,
Err(err) => return Err(format!("{}", err)),
};
let mut tokens = token_result.into_iter().rev().collect(); // need to reverse the tokens from the lexer
parse_token_list(&mut tokens)
}
The real work happens in the recursive parse_token_list
function:
File: src/parser.rs
fn parse_token_list(tokens: &mut Vec<Token>) -> Result<Expression, String> {
let token = tokens.pop();
if token != Some(Token::OpenParen) {
return Err(format!("Error: Expected OpenParen, found {:?}", token));
}
let mut list: Vec<Expression> = Vec::new();
while !tokens.is_empty() {
let token = tokens.pop();
if token == None {
return Err("Error: Did not find enough tokens".to_string());
}
let tok = token.unwrap();
match tok {
Token::Number(n) => list.push(Expression::Number(n)),
Token::Symbol(s) => list.push(Expression::Symbol(s)),
Token::OpenParen => {
tokens.push(Token::OpenParen);
let sub_list = parse_token_list(tokens)?; // recursive call
list.push(sub_list);
}
Token::CloseParen => {
return Ok(Expression::List(list));
}
}
}
Ok(Expression::List(list))
}
The parse_token_list
function takes a vector of tokens (in reverse order) generated by the lexer and generates a single List Expression representing the entire Scheme program. The function expects a vector of tokens, with the first token of the vector being an open parenthesis indicating the start of the list. The function then proceeds to process the elements of the list one at a time. If it encounters atomic tokens such as a number or symbol, it creates corresponding atomic expressions and adds them to the list. If it encounters another open parenthesis it recurses with the remaining tokens. Finally, the function returns with the list expression when it encounters the close parenthesis.
Step 4: Evaluation
Evaluation is the final step that will produce the result for the Scheme program. At a high level, the evaluator walks the List-based structure created by the parser and evaluates each atomic expression and list (recursively), combines these intermediate values, and produces the final result.
By default, the evaluator will use a global environment that includes the names of a small set of standard operations (e.g. +, -, and pow). This environment can be augmented with user-defined variables, using the expression: (define symbol value).
This is how we can define an environment with some Scheme standard operations:
File: src/env.rs
use std::f64::consts::PI;
use crate::operator_utils::*;
pub fn standard_env() -> Environment {
let mut environment = Environment::new();
// Basic arithmetic operators
environment.insert(
"+".to_string(),
Expression::Func(|args: &[Expression]| match add(args) {
Ok(expr) => expr,
Err(e) => panic!("{}", e),
}),
);
environment.insert(
"-".to_string(),
Expression::Func(|args: &[Expression]| match subtract(args) {
Ok(expr) => expr,
Err(e) => panic!("{}", e),
}),
);
environment.insert(
"*".to_string(),
Expression::Func(|args: &[Expression]| match multiply(args) {
Ok(expr) => expr,
Err(e) => panic!("{}", e),
}),
);
environment.insert(
"/".to_string(),
Expression::Func(|args: &[Expression]| match divide(args) {
Ok(expr) => expr,
Err(e) => panic!("{}", e),
}),
);
// Exponent
environment.insert(
"pow".to_string(),
Expression::Func(|args: &[Expression]| match power(args) {
Ok(expr) => expr,
Err(e) => panic!("{}", e),
}),
);
// PI constant
environment.insert("pi".to_string(), Expression::Number(PI));
environment
}
The standard environment uses a bunch of operator utility functions defined in the
operator_utils
module.
Here’s one example of such a utility function for calculating exponent:
File: src/operator_utils.rs
use crate::parser::Expression;
use anyhow::{anyhow, Result};
pub fn power(args: &[Expression]) -> Result<Expression> {
if args.len() != 2 {
return Err(anyhow!(
"Exponent operator requires two arguments: base and exponent factor"
));
}
let base = match args.get(0) {
Some(Expression::Number(num)) => Ok(num),
_ => Err(anyhow!("Expected a number")),
}?;
let n = match args.get(1) {
Some(Expression::Number(num)) => Ok(num),
_ => Err(anyhow!("Expected a number")),
}?;
let result = base.powf(*n);
Ok(Expression::Number(result))
}
Note: Please check the project’s GitHub repo for the rest of the utility functions.
Here’s the eval
function, which represents the entry point to the evaluation process:
File: src/eval.rs
pub fn eval(program: &str, env: &mut Environment) -> Result<Expression, String> {
let parsed_expr = match parse(program) {
Ok(expr) => expr,
Err(error) => {
eprintln!("Error during parsing: {}", error);
std::process::exit(1);
}
};
let evaluated_expression = eval_expr(parsed_expr, env)?;
Ok(evaluated_expression)
}
The eval
function calls the eval_expr
function shown below:
File: src/eval.rs
fn eval_expr(expr: Expression, env: &mut Environment) -> Result<Expression, String> {
match expr {
Expression::Bool(_) => Ok(expr),
Expression::Symbol(s) => env
.get(&s)
.cloned()
.ok_or_else(|| format!("Undefined symbol: {}", s)),
Expression::Number(_) => Ok(expr),
Expression::List(list) => eval_list(&list, env),
Expression::Func(_) => Ok(expr),
Expression::Function(_) => Err("Unexpected function definition".into()),
}
}
The bulk of the evaluation process happens in the eval_list
function. The function takes the List expression representing the program and the environment variable as inputs. The function then starts processing the List expression representing the program by iterating over each element of the list:
File: src/eval.rs
use crate::env::Environment;
use crate::parser::Expression;
fn eval_list(list: &[Expression], env: &mut Environment) -> Result<Expression, String> {
let first = &list[0];
if let Expression::Symbol(s) = first {
match s.as_str() {
"define" => eval_define(&list, env),
"if" => eval_if(&list, env),
_ => {
if let Some(exp) = env.get(s) {
match exp {
Expression::Func(f) => {
let function = f.clone();
let args: Result<Vec<Expression>, String> = list[1..]
.iter()
.map(|x| eval_expr(x.clone(), env))
.collect();
Ok(function(&args?))
}
Expression::Function(proc) => {
let env_clone = &mut env.clone();
let args: Result<Vec<Expression>, String> = list[1..]
.iter()
.map(|x| eval_expr(x.clone(), env_clone))
.collect();
// Create a new execution environment for the function
let mut local_env = proc.env.clone();
// Insert the function name into the new environment
local_env.insert(s.clone(), exp.clone());
for (param, arg) in proc.params.iter().zip(args?) {
if let Expression::Symbol(param_name) = param {
local_env.insert(param_name.clone(), arg);
} else {
return Err("Invalid parameter name".into());
}
}
let mut result = Expression::Bool(false);
for exp in proc.body.clone() {
result = eval_expr(exp.clone(), &mut local_env)?;
}
Ok(result)
}
_ => Err(format!("Undefined function: {}", s)),
}
} else {
Err(format!("Undefined function: {}", s))
}
}
}
} else {
Err("Expected a symbol".into())
}
}
Here’s the eval_define
function:
File: src/eval.rs
fn eval_define(list: &[Expression], env: &mut Environment) -> Result<Expression, String> {
if list.len() < 3 {
return Err("'define' requires at least two arguments".into());
}
// Define a new function or variable
if let Expression::List(func) = list.get(1).unwrap() {
if let Some(Expression::Symbol(func_name)) = func.get(0) {
let params = func[1..].to_vec();
let body = list.get(2..).ok_or("Invalid define syntax")?.to_vec();
let proc = Procedure {
params,
body,
env: env.clone(),
};
let function = Expression::Function(proc);
env.insert(func_name.clone(), function);
Ok(Expression::Symbol(func_name.clone()))
} else {
Err("Invalid define syntax".into())
}
} else if let Expression::Symbol(var_name) = list.get(1).unwrap() {
let value = eval_expr(list[2].clone(), env)?;
env.insert(var_name.clone(), value.clone());
return Ok(Expression::Symbol(var_name.clone()));
} else {
return Err("Invalid define syntax".into());
}
}
And here’s the eval_if
function:
File: src/eval.rs
fn eval_if(list: &[Expression], env: &mut Environment) -> Result<Expression, String> {
if list.len() < 4 {
return Err("'if' requires at least three arguments".into());
}
let condition = eval_expr(list[1].clone(), env)?;
match condition {
Expression::Bool(true) => eval_expr(list[2].clone(), env),
Expression::Bool(false) => eval_expr(list[3].clone(), env),
_ => Err("Invalid condition in if expression".into()),
}
}
At this point, we are done with all the important parts of the interpreter.
Next up, we are going to create a REPL to interact with our Scheme programs from the command line.
Step 5: Create a REPL
In Scheme, a REPL (Read-Eval-Print-Loop) gives us a way of interacting with a program by entering an expression and seeing it immediately read, evaluated, and printed, without having to go through a lengthy build/compile/run cycle.
The REPL is implemented as a simple loop that takes one line as an input from the user, evaluates it, and prints out the evaluated result.
The repl
function is defined in the lib.rs
file:
File: src/lib.rs
use std::io;
pub mod env;
pub mod eval;
use crate::env::standard_env;
use crate::eval::eval;
pub fn repl() {
let mut global_env = standard_env();
loop {
println!("schemer>");
let expr = read_input().unwrap();
match eval(expr.as_ref(), &mut global_env) {
Ok(val) => println!(" ==> {}", val),
Err(error) => eprintln!("==> Error: {}", error),
};
}
}
The repl
function uses the read_input
helper function:
File: src/lib.rs
use anyhow::{Context, Result};
fn read_input() -> Result<String> {
let mut raw_input = String::new();
io::stdin()
.read_line(&mut raw_input)
.context("Failed to read line")?;
Ok(raw_input)
}
The main
function just calls the repl
function to start the program:
File: src/main.rs
fn main() {
rustyscm::repl();
}
At this point the file structure should look like this:
.
├── Cargo.lock
├── Cargo.toml
└── src
├── env.rs
├── eval.rs
├── lexer.rs
├── lib.rs
├── main.rs
├── operator_utils.rs
└── parser.rs
Step 6: Run the program
We can run the program by using:
$ cargo run
You’ll get a prompt like this with a blinking cursor below it:
schemer>
_
You can then type your Scheme expressions as follows:
schemer>
(+ 2 2)
==> 4
Our little Scheme interpreter is capable of evaluating basic arithmetic operations:
schemer>
(+ 2 2 3)
==> 7
schemer>
(- 4 2)
==> 2
schemer>
(+ 10 5 (- 10 3 3))
==> 19
schemer>
(* 3 2)
==> 6
schemer>
(/ 8 4)
==> 2
schemer>
(pow 2 16)
==> 65536
We can define variables:
schemer>
(define x 5)
==> x
schemer>
(* x 2)
==> 10
schemer>
(define r 10)
==> r
schemer>
(* pi (* r r))
==> 314.1592653589793
We can do comparison:
schemer>
(< 4 2)
==> false
schemer>
(> 4 2)
==> true
schemer>
(>= 4 2)
==> true
We can evaluate if
expressions:
schemer>
(if (> 2 4 ) 1 2)
==> 2
schemer>
(if (< 2 4 ) 1 2)
==> 1
schemer>
(if (> (* 11 11) 120) (* 7 6) oops)
==> 42
We can define functions:
schemer>
(define (sum a b) (+ a b))
==> sum
schemer>
(sum 10 20)
==> 30
schemer>
(define (square x) (* x x))
==> square
schemer>
(square 5)
==> 25
schemer>
(define (circle-area r) (* pi (* r r)))
==> circle-area
schemer>
(circle-area (+ 5 5))
==> 314.1592653589793
schemer>
(circle-area 3)
==> 28.274333882308138
We can define functions to calculate factorials and fibonacci numbers:
schemer>
(define (fact n) (if (<= n 1) 1 (* n (fact (- n 1)))))
==> fact
schemer>
(fact 10)
==> 3628800
schemer>
(define (fib n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
==> fib
schemer>
(fib 10)
==> 89
We can even work with nested functions:
schemer>
(define (cube x) (define (square x) (* x x)) (* x (square x)))
==> cube
schemer>
(cube 3)
==> 27
We now have a complete, working interpreter for a subset of the Scheme programming language.
Step 7: Add tests
Our final step is to add both unit and integration tests for the project.
Here’s an example of a unit test for the lexer’s tokenize
function:
File: src/lexer.rs
// ...rest of file contents omitted...
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_tokenize() {
let input = "(define r 10)";
let actual_tokens = tokenize(input).unwrap_or(vec![]);
let expected_tokens = vec![
Token::OpenParen,
Token::Symbol("define".to_string()),
Token::Symbol("r".to_string()),
Token::Number(10.0),
Token::CloseParen,
];
assert_eq!(actual_tokens, expected_tokens);
}
}
Integration tests in Rust are placed inside a separate tests
directory.
Here’s an example of an integration test that verifies the functionality for defining a procedure:
File: tests/integration_test.rs
use rustyscm::env::standard_env;
use rustyscm::eval::eval;
use rustyscm::parser::Expression;
#[test]
fn test_function_definition() {
let mut env = standard_env();
let input1 = "(define (square x) (* x x))";
let result1 = eval(input1, &mut env).unwrap();
assert_eq!(result1, Expression::Symbol("square".to_string()));
let input2 = "(square 5)";
let result2 = eval(input2, &mut env).unwrap();
assert_eq!(result2, Expression::Number(25.0));
}
The final project structure should now look like this:
.
├── Cargo.lock
├── Cargo.toml
├── src
│ ├── env.rs
│ ├── eval.rs
│ ├── lexer.rs
│ ├── lib.rs
│ ├── main.rs
│ ├── operator_utils.rs
│ └── parser.rs
└── tests
└── integration_test.rs
To run the tests use:
$ cargo test
Summary
In this guide, we’ve been able to implement a working interpreter for a small subset of the Scheme programming language. Of course, our interpreter lacks a lot of other features that a “real” language should have. But it’s good enough for our learning purposes.
Resources
This guide was inspired by the following excellent resources: