Building Your Own Programming Language From Scratch: Part VIII - Nested Classes

Written by alexandermakeev | Published 2023/01/16
Tech Story Tags: hackernoon-top-story | java | ruby | python | syntax-analysis | lexical-analysis | programming | lexer | hackernoon-es | hackernoon-hi | hackernoon-zh | hackernoon-vi | hackernoon-fr | hackernoon-pt | hackernoon-ja

TLDRIn this part of creating your own programming language we will implement nested classesvia the TL;DR App

In this part of creating your own programming language, we’ll continue improving our language by implementing nested classes and slightly upgrading the classes introduced in the previous part. Please check out the previous parts:

  1. Building Your Own Programming Language From Scratch
  2. Building Your Own Programming Language From Scratch: Part II - Dijkstra's Two-Stack Algorithm
  3. Build Your Own Programming Language Part III: Improving Lexical Analysis with Regex Lookaheads
  4. Building Your Own Programming Language From Scratch Part IV: Implementing Functions
  5. Building Your Own Programming Language From Scratch: Part V - Arrays
  6. Building Your Own Programming Language From Scratch: Part VI - Loops
  7. Building Your Own Programming Language From Scratch: Part VII - Classes

The full source code is available over on GitHub.

1 Lexical Analysis

In the first section, we will cover lexical analysis. In short, it’s a process to divide the source code into language lexemes, such as keyword, variable, operator, etc.

You may remember from the previous parts that I was using the regex expressions listed in the TokenType enum to define all the lexeme types:

package org.example.toylanguage.token;

public enum TokenType {
    Comment("\\#.*"),
    LineBreak("[\\n\\r]"),
    Whitespace("[\\s\\t]"),
    Keyword("(if|elif|else|end|print|input|class|fun|return|loop|in|by|break|next)(?=\\s|$)"),
    GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"),
    Logical("(true|false)(?=\\s|$)"),
    Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"),
    Null("(null)(?=,|\\s|$)"),
    This("(this)(?=,|\\s|$)"),
    Text("\"([^\"]*)\""),
    Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"),
    Variable("[a-zA-Z_]+[a-zA-Z0-9_]*");

    private final String regex;
}

Let’s look into this nested classes prototype, and add the missing regex expressions into our TokenType lexemes:

When we create an instance of a nested class, we use the following construction with two expressions and one operator between them:

The left expression class_instance is an instance of the class we are referring to get a nested class from. The right expression NestedClass [args] is a nested class with the properties to create an instance.

Lastly, as an operator to create a nested class, I’ll be using the following expression: :: new, meaning that we refer to the class instance property with the two colon :: operator, and then we create an instance with the new operator.

With the current set of lexemes, we only need to add a regular expression for the :: new operator. This operator can be validated by the following regex expression:

:{2}\\s+new

Let’s add this expression in the Operator lexeme as OR expression before the :{2} part standing for accessing class property:

public enum TokenType {
    ...
    Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}\\s+new|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"),
    ...
}

2 Syntax Analysis

In the second section, we will convert lexemes received from the lexical analyzer into the final statements following our language rules.

2.1 OperatorExpression

To evaluate math expressions, we’re using Dijkstra's Two-Stack algorithm. Each operation in this algorithm can be presented by a unary operator with one operand or by a binary operator with respectively two operands:

Nested class’s instantiation is a binary operation where the left operand is a class instance we use to refer to the class where the nested class is defined, and the second operand is a nested class we create an instance of:

Let’s create the NestedClassInstanceOperator implementation by extending

BinaryOperatorExpression:

package org.example.toylanguage.expression.operator;

public class NestedClassInstanceOperator extends BinaryOperatorExpression {
    public NestedClassInstanceOperator(Expression left, Expression right) {
        super(left, right);
    }

    @Override
    public Value<?> evaluate() {
        ...
    }
}

Next, we should complete the evaluate() method that will perform nested class instantiation:

First, we evaluate the left operand’s expression into the Value expression:

@Override
public Value<?> evaluate() {
    // ClassExpression -> ClassValue
    Value<?> left = getLeft().evaluate();
}

Next, we need to evaluate() the right operand. In this case, we can't directly invoke Expression#evaluate() because the nested classes' definitions are declared in the parent class's DefinitionScope (in the left operand).

To access the definitions of nested classes, we should create an auxiliary ClassExpression#evaluate(ClassValue) method that will take the left operand and use its DefinitionScope to access the nested class definition and create an instance of:

@Override
public Value<?> evaluate() {
    Value<?> left = getLeft().evaluate();

    if (left instanceof ClassValue && getRight() instanceof ClassExpression) {
        // instantiate nested class
        // new Class [] :: new NestedClass []
        return ((ClassExpression) getRight()).evaluate((ClassValue) left);
    } else {
        throw new ExecutionException(String.format("Unable to access class's nested class `%s``", getRight()));
    }
}

Lastly, let’s implement the missing ClassExpression#evaluate(ClassValue) method.

This implementation will be similar to the ClassExpression#evaluate() method with the only difference being that we should set the ClassDefinition#getDefinitionScope() in order to retrieve nested class definitions:

package org.example.toylanguage.expression;

…
public class ClassExpression implements Expression {
    private final String name;
    private final List<Expression> argumentExpressions;

    @Override
    public Value<?> evaluate() {
        //initialize class arguments
        List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());
        return evaluate(values);
    }

    /**
     * Evaluate nested class
     *
     * @param classValue instance of the parent class
     */
    public Value<?> evaluate(ClassValue classValue) {
        //initialize class arguments
        List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());

        //set parent class's definition
        ClassDefinition classDefinition = classValue.getValue();
        DefinitionContext.pushScope(classDefinition.getDefinitionScope());

        try {
            return evaluate(values);
        } finally {
            DefinitionContext.endScope();
        }
    }

    private Value<?> evaluate(List<Value<?>> values) {
        //get class's definition and statement
        ClassDefinition definition = DefinitionContext.getScope().getClass(name);
        ClassStatement classStatement = definition.getStatement();

        //set separate scope
        MemoryScope classScope = new MemoryScope(null);
        MemoryContext.pushScope(classScope);

        try {
            //initialize constructor arguments
            ClassValue classValue = new ClassValue(definition, classScope);
            ClassInstanceContext.pushValue(classValue);
            IntStream.range(0, definition.getArguments().size()).boxed()
                    .forEach(i -> MemoryContext.getScope()
                            .setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));

            //execute function body
            DefinitionContext.pushScope(definition.getDefinitionScope());
            try {
                classStatement.execute();
            } finally {
                DefinitionContext.endScope();
            }

            return classValue;
        } finally {
            MemoryContext.endScope();
            ClassInstanceContext.popValue();
        }
    }
}

2.2 Operator

All the operators we’re using to evaluate math expressions are stored in the Operator enum with the corresponding precedence, character, and OperatorExpression type that we refer to calculate the result of each operation:

...
public enum Operator {
    Not("!", NotOperator.class, 7),
    ClassInstance("new", ClassInstanceOperator.class, 7),
    ...

    private final String character;
    private final Class<? extends OperatorExpression> type;
    private final Integer precedence;

    Operator(String character, Class<? extends OperatorExpression> type, Integer precedence) {
        this.character = character;
        this.type = type;
        this.precedence = precedence;
    }

    public static Operator getType(String character) {
        return Arrays.stream(values())
                .filter(t -> Objects.equals(t.getCharacter(), character))
                .findAny().orElse(null);
    }
    ...
}

We already have the ClassInstance value for a regular class’s initialization. Let’s add a new value to manage nested class instances.

The new NestedClassInstance value will have the same character expression as we defined in the TokenType earlier and the same precedence as the regular class's instance.

For the OperatorExpression type, we will use the previously defined NestedClassInstanceOperator:

...
public enum Operator {
    Not("!", NotOperator.class, 7),
    ClassInstance("new", ClassInstanceOperator.class, 7),
    NestedClassInstance(":{2}\\s+new", NestedClassInstanceOperator.class, 7),
    ...
}

You may notice that we don’t have regex expressions in the character property except for this new operator. To read the NestedClassInstance operator using a regex expression, we should update the Operator#getType() method to match an operator with a regular expression:

public enum Operator {
    ...
    public static Operator getType(String character) {
        return Arrays.stream(values())
                .filter(t -> character.matches(t.getCharacter()))
                .findAny().orElse(null);
    }
    ...
}

Lastly, we should add two backslashes \\ before a character for operations containing the following symbols: +, *, (, ) to make sure these characters are not treated as regex search symbols:

Multiplication("\\*", MultiplicationOperator.class, 6),
Addition("\\+", AdditionOperator.class, 5),
LeftParen("\\(", 3),
RightParen("\\)", 3),

After we introduced the NestedClassInstance operator, we should inject it into the ExpressionReader class that actually parses math expressions into operands and operators. We just need to find the line where we read the class instance:

if (!operators.isEmpty() && operators.peek() == Operator.ClassInstance) {
    operand = readClassInstance(token);
}

To support reading the NestedClassInstance operator, we add the corresponding condition for the current operator in the operator’s stack:

if (!operators.isEmpty() && (operators.peek() == Operator.ClassInstance || operators.peek() == Operator.NestedClassInstance)) {
    operand = readClassInstance(token);
}

The readClassInstance() method will read a nested class's declaration with properties the same way it reads a regular class declaration. This method returns ClassExpression instance as a whole operand expression.

3. Wrap Up

Everything is ready now. In this part, we implemented nested classes as one more step toward making a complete programming language.


Written by alexandermakeev | Senior SWE at Layermark
Published by HackerNoon on 2023/01/16