Building Your Own Programming Language From Scratch: Part VII

In this part of creating your own programming language, we will implement classes as an extension over the previously defined structures. Please check out the previous parts:

The full source code is available over on GitHub.

1. Lexical Analysis

In the first section, we will cover lexical analysis. In short, it’s a process to divide the source code into language lexemes, such as keyword, variable, operator, etc.

You may remember from the previous parts that I was using the regex expressions in the TokenType enum to define all the lexeme types.

Let’s look at the following class prototype and add the missing regex parts to our TokenType lexemes:

First, we need to put the class word in the Keyword lexeme's expression to let the lexical analyzer know where our class declaration begins:

package org.example.toylanguage.token;

...
public enum TokenType {
    ...
    Keyword("(if|elif|else|end|print|input|fun|return|loop|in|by|break|next|class)(?=\\s|$)"),
    ...

    private final String regex;
}

Second, we need the new This lexeme as a marker for a reference to the current object:

public enum TokenType {
    ...
    This("(this)(?=,|\\s|$)");

    private final String regex;
}

2. Syntax Analysis

In the second section, we will transform the lexemes received from the lexical analyzer into the final statements following our language rules.

2.1 Definition Scope

When we declare a class or a function, this declaration should be available within the defined isolated bounds. For example, if we declare a function named turn_on [] in the following listing, it will be available for execution after declaration:

But if we declare the same function inside a class scope, this function will no longer be accessed directly from the main block:

To implement these definition bounds, we’ll create the DefinitionScope class and store all the declared definitions inside two sets for classes and functions:

package org.example.toylanguage.context.definition;

public class DefinitionScope {
    private final Set<ClassDefinition> classes;
    private final Set<FunctionDefinition> functions;

    public DefinitionScope() {
        this.classes = new HashSet<>();
        this.functions = new HashSet<>();
    }
}

In addition, we may want to access the parent’s definition scope. For example, if we declare two separate classes and create an instance of the first class inside the second one:

To provide this ability, we will add the DefinitionScope parent instance as a reference to the upper layer, which we will use to climb to the top definition layer.

public class DefinitionScope {
    private final Set<ClassDefinition> classes;
    private final Set<FunctionDefinition> functions;
    private final DefinitionScope parent;

    public DefinitionScope(DefinitionScope parent) {
        this.classes = new HashSet<>();
        this.functions = new HashSet<>();
        this.parent = parent;
    }
}

Now, let’s finish the implementation by providing the interfaces to add a definition and retrieve it by name using parent scope:

public class DefinitionScope {
    …

    public ClassDefinition getClass(String name) {
        Optional<ClassDefinition> classDefinition = classes.stream()
                .filter(t -> t.getName().equals(name))
                .findAny();
        if (classDefinition.isPresent())
            return classDefinition.get();
        else if (parent != null)
            return parent.getClass(name);
        else
            throw new ExecutionException(String.format("Class is not defined: %s", name));
    }

    public void addClass(ClassDefinition classDefinition) {
        classes.add(classDefinition);
    }

    public FunctionDefinition getFunction(String name) {
        Optional<FunctionDefinition> functionDefinition = functions.stream()
                .filter(t -> t.getName().equals(name))
                .findAny();
        if (functionDefinition.isPresent())
            return functionDefinition.get();
        else if (parent != null)
            return parent.getFunction(name);
        else
            throw new ExecutionException(String.format("Function is not defined: %s", name));
    }

    public void addFunction(FunctionDefinition functionDefinition) {
        functions.add(functionDefinition);
    }
}

Finally, to manage declared definition scopes and switch between them, we create the context class using java.util.Stack collection (LIFO):

package org.example.toylanguage.context.definition;

public class DefinitionContext {
    private final static Stack<DefinitionScope> scopes = new Stack<>();

    public static DefinitionScope getScope() {
        return scopes.peek();
    }

    public static DefinitionScope newScope() {
        return new DefinitionScope(scopes.isEmpty() ? null : scopes.peek());
    }

    public static void pushScope(DefinitionScope scope) {
        scopes.push(scope);
    }

    public static void endScope() {
        scopes.pop();
    }
}

2.2 Memory scope

In this section, we will cover the MemoryScope to manage class and function variables.

Each declared variable, similarly to the class or function definition, should be accessible only within an isolated block of code. For example, if we define a variable in the following listing, you can access it right after the declaration:

But if we declare a variable within a function or a class, the variable will no longer be available from the main (upper) block of code:

To implement this logic and store variables defined in a specific scope, we create the MemoryScope class that will contain a map with the variable name as a key and the variable Value as a value:

public class MemoryScope {
    private final Map<String, Value<?>> variables;

    public MemoryScope() {
        this.variables = new HashMap<>();
    }
}

Next, similarly to the DefinitionScope, we provide access to the parent’s scope variables:

public class MemoryScope {
    private final Map<String, Value<?>> variables;
    private final MemoryScope parent;

    public MemoryScope(MemoryScope parent) {
        this.variables = new HashMap<>();
        this.parent = parent;
    }
}

Next, we add methods to get and set variables. When we set a variable, we always re-assign the previously set value if there is already a defined variable in the upper MemoryScope layer:

public class MemoryScope {
    ...

    public Value<?> get(String name) {
        Value<?> value = variables.get(name);
        if (value != null)
            return value;
        else if (parent != null)
            return parent.get(name);
        else
            return NullValue.NULL_INSTANCE;
    }

    public void set(String name, Value<?> value) {
        MemoryScope variableScope = findScope(name);
        if (variableScope == null) {
            variables.put(name, value);
        } else {
            variableScope.variables.put(name, value);
        }
    }

    private MemoryScope findScope(String name) {
        if (variables.containsKey(name))
            return this;
        return parent == null ? null : parent.findScope(name);
    }
}

In addition to the set and get methods, we add two more implementations to interact with the current (local) layer of MemoryScope:

public class MemoryScope {
    ...

    public Value<?> getLocal(String name) {
        return variables.get(name);
    }

    public void setLocal(String name, Value<?> value) {
        variables.put(name, value);
    }
}

These methods will be used later to initialize function arguments or class’s instance arguments. For example, if we create an instance of the Lamp class and pass the predefined global type variable, this variable shouldn’t be changed when we try to update the lamp_instance :: type property:

Finally, to manage variables and switch between memory scopes, we create the MemoryContext implementation using java.util.Stack collection:

package org.example.toylanguage.context;

public class MemoryContext {
    private static final Stack<MemoryScope> scopes = new Stack<>();

    public static MemoryScope getScope() {
        return scopes.peek();
    }

    public static MemoryScope newScope() {
        return new MemoryScope(scopes.isEmpty() ? null : scopes.peek());
    }

    public static void pushScope(MemoryScope scope) {
        scopes.push(scope);
    }

    public static void endScope() {
        scopes.pop();
    }
}

2.3 Class Definition

In this section, we will read and store class definitions.

First, we create the Statement implementation. This statement will be executed every time we create a class’s instance:

package org.example.toylanguage.statement;

public class ClassStatement {
}

Each class can contain nested statements for the initialization and other operations, like a constructor. To store these statements, we extend the CompositeStatement which contains the list of nested statements to execute:

public class ClassStatement extends CompositeStatement {
}

Next, we declare the ClassDefinition to store the class name, its arguments, constructor statements, and the definition scope with the class’s functions:

package org.example.toylanguage.context.definition;

import java.util.List;

@RequiredArgsConstructor
@Getter
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
public class ClassDefinition implements Definition {
    @EqualsAndHashCode.Include
    private final String name;
    private final List<String> arguments;
    private final ClassStatement statement;
    private final DefinitionScope definitionScope;
}

Now, we’re ready to read the class declaration using StatementParser. When we meet the Keyword lexeme with class value, first we need to read the class name and its arguments inside the square brackets:

package org.example.toylanguage;

public class StatementParser {
    ...

    private void parseClassDefinition() {
        Token type = tokens.next(TokenType.Variable);

        List<String> arguments = new ArrayList<>();

        if (tokens.peek(TokenType.GroupDivider, "[")) {

            tokens.next(TokenType.GroupDivider, "["); //skip opening square bracket

            while (!tokens.peek(TokenType.GroupDivider, "]")) {
                Token argumentToken = tokens.next(TokenType.Variable);
                arguments.add(argumentToken.getValue());

                if (tokens.peek(TokenType.GroupDivider, ","))
                    tokens.next();
            }

            tokens.next(TokenType.GroupDivider, "]"); //skip closing square bracket
        }
    }

}

After the arguments, we expect to read the nested constructor statements:

To store these statements, we create an instance of the previously defined ClassStatement:

private void parseClassDefinition() {
    ...

    ClassStatement classStatement = new ClassStatement();
}

In addition to arguments and nested statements, our classes can also contain functions. To make these functions accessible only within the class definition, we initialize a new layer ofDefinitionScope:

private void parseClassDefinition() {
    ...

    ClassStatement classStatement = new ClassStatement();
    DefinitionScope classScope = DefinitionContext.newScope();
}

Next, we initialize an instance of ClassDefinition and put it to the DefinitionContext:

private void parseClassDefinition() {
    ...

    ClassStatement classStatement = new ClassStatement();
    DefinitionScope classScope = DefinitionContext.newScope();
    ClassDefinition classDefinition = new ClassDefinition(type.getValue(), arguments, classStatement, classScope);
    DefinitionContext.getScope().addClass(classDefinition);
}

Finally, to read the class constructor statements and functions, we call the static StatementParser#parse() method that will collect statements inside classStatement instance until we meet the finalizing end lexeme that must be skipped at the end:

private void parseClassDefinition() {
    ...

    //parse class statements
    StatementParser.parse(this, classStatement, classScope);
    tokens.next(TokenType.Keyword, "end");
}

2.4 Class Instance

At this point, we can already read class definitions with constructor statements and functions. Now, let’s parse the class’s instance:

First, we define ClassValue, which will contain the state of each class instance. Classes, unlike functions, should have a permanent MemoryScope and this scope should be available with all the class's instance arguments and state variables every time we interact with the class instance:

public class ClassValue extends IterableValue<ClassDefinition> {
    private final MemoryScope memoryScope;

    public ClassValue(ClassDefinition definition, MemoryScope memoryScope) {
        super(definition);
        this.memoryScope = memoryScope;
    }
}

Next, we provide methods to work with the class’s instance properties using MemoryContext:

public class ClassValue extends IterableValue<ClassDefinition> {
    private final MemoryScope memoryScope;

    public ClassValue(ClassDefinition definition, MemoryScope memoryScope) {
        super(definition);
        this.memoryScope = memoryScope;
    }

    @Override
    public String toString() {
            return getValue().getArguments().stream()
                 .map(t -> t + " = " + getValue(t))
                 .collect(Collectors.joining(", ", getValue().getName() + " [ ", " ]"));
    }

    public Value<?> getValue(String name) {
         Value<?> result = MemoryContext.getScope().getLocal(name);
         return result != null ? result : NULL_INSTANCE;
    }

    public void setValue(String name, Value<?> value) {
         MemoryContext.getScope().setLocal(name, value);
    }
}

Please notice that by calling theMemoryScope#getLocal() and MemoryScope#setLocal() methods, we work with the current layer of MemoryScope variables. But before accessing the class’s instance state, we need to put its MemoryScope to the MemoryContext and release it when we finish:

public class ClassValue extends IterableValue<ClassDefinition> {
    ...

    public Value<?> getValue(String name) {
        MemoryContext.pushScope(memoryScope);
        try {
            Value<?> result = MemoryContext.getScope().getLocal(name);
            return result != null ? result : NULL_INSTANCE;
        } finally {
            MemoryContext.endScope();
        }
    }

    public void setValue(String name, Value<?> value) {
        MemoryContext.pushScope(memoryScope);
        try {
            MemoryContext.getScope().setLocal(name, value);
        } finally {
            MemoryContext.endScope();
        }
    }
}

Next, we can implement the remaining ClassExpression that will be used to construct defined class instances during syntax analysis. To declare class’s instance definition, we provide the ClassDefinition and list of Expression arguments that will be transformed into the final Value instances during statement execution:

package org.example.toylanguage.expression;

@RequiredArgsConstructor
@Getter
public class ClassExpression implements Expression {
    private final ClassDefinition definition;
    private final List<Expression> arguments;

    @Override
    public Value<?> evaluate() {
        ...
    }
}

Let's implement the Expression#evaluate() method that will be used during execution to create an instance of the previously defined ClassValue. First, we evaluate the Expression arguments into the Value arguments:

@Override
public Value<?> evaluate() {
    //initialize class arguments
    List<Value<?>> values = arguments.stream()
                                     .map(Expression::evaluate)
                                     .collect(Collectors.toList());
}

Next, we create an empty memory scope that should be isolated from the other variables and can contain only the class's instance state variables:

@Override
public Value<?> evaluate() {
    //initialize class arguments
    List<Value<?>> values = arguments.stream().map(Expression::evaluate).collect(Collectors.toList());

    //get class's definition and statement
    ClassStatement classStatement = definition.getStatement();

    //set separate scope
    MemoryScope classScope = new MemoryScope(null);
    MemoryContext.pushScope(classScope);

    try {
	    ...
    } finally {
        MemoryContext.endScope();
    }
}

Next, we create an instance of ClassValue and write the class’s Value arguments to the isolated memory scope:

try {
    //initialize constructor arguments
    ClassValue classValue = new ClassValue(definition, classScope);
    IntStream.range(0, definition.getArguments().size()).boxed()
            .forEach(i -> MemoryContext.getScope()
                    .setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));
    
    ...

} finally {
    MemoryContext.endScope();
}

Please notice that we transformed the Expression arguments into the Value arguments before setting up an empty MemoryScope. Otherwise, we will not be able to access the class’s instance arguments, for example:

And finally, we can execute the ClassStatement. But before that, we should set the class’s DefinitionScope to be able to access the class’s functions within constructor statements:

try {
    //initialize constructor arguments
    ClassValue classValue = new ClassValue(definition, classScope);
    ClassInstanceContext.pushValue(classValue);
    IntStream.range(0, definition.getArguments().size()).boxed()
            .forEach(i -> MemoryContext.getScope()
                    .setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));

    //execute function body
    DefinitionContext.pushScope(definition.getDefinitionScope());
    try {
        classStatement.execute();
    } finally {
        DefinitionContext.endScope();
    }

    return classValue;
} finally {
    MemoryContext.endScope();
    ClassInstanceContext.popValue();
}

9*. One last thing, we can make the classes more flexible and allow a user to create class instances before declaring the class definition:

This can be done by delegating the ClassDefinition initialization to the DefinitionContext and access it only when we evaluate an expression:

public class ClassExpression implements Expression {
    private final String name;
    private final List<Expression> arguments;

    @Override
    public Value<?> evaluate() {
        //get class's definition and statement
        ClassDefinition definition = DefinitionContext.getScope().getClass(name);
        ...
    }
}

You can do the same delegation for the FunctionExpression to invoke functions before definition.

Finally, we can finish reading the class instances with the ExpressionReader. There is no difference between the previously defined structure instances. We just need to read the Expression arguments and construct the ClassExpression:

public class ExpressionReader
    ...

    // read class instance: new Class[arguments]
    private ClassExpression readClassInstance(Token token) {
        List<Expression> arguments = new ArrayList<>();
        if (tokens.peekSameLine(TokenType.GroupDivider, "[")) {

            tokens.next(TokenType.GroupDivider, "["); //skip opening square bracket

            while (!tokens.peekSameLine(TokenType.GroupDivider, "]")) {
                Expression value = ExpressionReader.readExpression(this);
                arguments.add(value);

                if (tokens.peekSameLine(TokenType.GroupDivider, ","))
                    tokens.next();
            }

            tokens.next(TokenType.GroupDivider, "]"); //skip closing square bracket
        }
        return new ClassExpression(token.getValue(), arguments);
    }
}

2.5 Class Function

At this moment, we can create a class and execute the class's constructor statements. But we are still unable to execute the class's functions.

Let’s overload the FunctionExpression#evaluate method that will accept the ClassValue as a reference to the class instance we want to invoke a function from:

package org.example.toylanguage.expression;

public class FunctionExpression implements Expression {
    ...

    public Value<?> evaluate(ClassValue classValue) {
    }
}

The next step is to transform the function Expression arguments into the Value arguments using current MemoryScope:

public Value<?> evaluate(ClassValue classValue) {
    //initialize function arguments
    List<Value<?>> values = argumentExpressions.stream()
                                               .map(Expression::evaluate)
                                               .collect(Collectors.toList());
}

Next, we need to pass the class's MemoryScope and DefinitionScope to the context:

...

//get definition and memory scopes from class definition
ClassDefinition classDefinition = classValue.getValue();
DefinitionScope classDefinitionScope = classDefinition.getDefinitionScope();

//set class's definition and memory scopes
DefinitionContext.pushScope(classDefinitionScope);
MemoryContext.pushScope(memoryScope);

Lastly for this implementation, we invoke default FunctionExpression#evaluate(List<Value<?>> values) method and pass evaluated Value arguments:

public Value<?> evaluate(ClassValue classValue) {
    //initialize function arguments
    List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());

    //get definition and memory scopes from class definition
    ClassDefinition classDefinition = classValue.getValue();
    DefinitionScope classDefinitionScope = classDefinition.getDefinitionScope();
    MemoryScope memoryScope = classValue.getMemoryScope();

    //set class's definition and memory scopes
    DefinitionContext.pushScope(classDefinitionScope);
    MemoryContext.pushScope(memoryScope);

    try {
        //proceed function
        return evaluate(values);
    } finally {
        DefinitionContext.endScope();
        MemoryContext.endScope();
    }
}

To invoke the class’s function, we will use the double colon :: operator. Currently, this operator is managed by the ClassPropertyOperator (StructureValueOperator) implementation, which is responsible for accessing the class’s properties:

Let’s improve it to support function invocations with the same double colon :: character:

Class’s function can be managed by this operator only when the left expression is the ClassExpression and the second one is the FunctionExpression:

package org.example.toylanguage.expression.operator;

public class ClassPropertyOperator extends BinaryOperatorExpression implements AssignExpression {
    public ClassPropertyOperator(Expression left, Expression right) {
        super(left, right);
    }

    @Override
    public Value<?> evaluate() {
        Value<?> left = getLeft().evaluate();

        if (left instanceof ClassValue) {
            if (getRight() instanceof VariableExpression) {
                // access class's property
                // new ClassInstance[] :: class_argument
                return ((ClassValue) left).getValue(((VariableExpression) getRight()).getName());
            } else if (getRight() instanceof FunctionExpression) {
                // execute class's function
                // new ClassInstance[] :: class_function []
                return ((FunctionExpression) getRight()).evaluate((ClassValue) left);
            }
        }

        throw new ExecutionException(String.format("Unable to access class's property `%s``", getRight()));
    }

    @Override
    public void assign(Value<?> value) {
        Value<?> left = getLeft().evaluate();

        if (left instanceof ClassValue && getRight() instanceof VariableExpression) {
            String propertyName = ((VariableExpression) getRight()).getName();
            ((ClassValue) left).setValue(propertyName, value);
        }
    }
}

3. Wrap Up

In this part, we implemented classes. We can now create more complex things, such as stack implementation:

main []

fun main []
    stack = new Stack []
    loop num in 0..5
        # push 0,1,2,3,4
        stack :: push [ num ]
    end

    size = stack :: size []
    loop i in 0..size
        # should print 4,3,2,1,0
        print stack :: pop []
    end
end

class Stack []
    arr = {}
    n = 0

    fun push [ item ]
        this :: arr << item
        n = n + 1
    end

    fun pop []
        n = n - 1
        item = arr { n }
        arr { n } = null
        return item
    end

    fun size []
        return this :: n
    end
end

Building Your Own Programming Language From Scratch: Part VII - Classes