Building Your Own Programming Language From Scratch: Part VI - Loops

Written by alexmakeev1995 | Published 2022/07/20
Tech Story Tags: java | lexical-analysis | syntax-analysis | regex | loop | bubble-sort | programming | learning-to-code

TLDRIn this part of creating your programming language, we will implement loops. We will be able to perform For, While and For-each loops. At the end, we'll test loops by implementing the bubble sort algorithm.via the TL;DR App

In this part of creating your programming language, we will implement loops.

Please see the previous parts:

  1. Building Your Own Programming Language From Scratch
  2. Building Your Own Programming Language From Scratch: Part II - Dijkstra's Two-Stack Algorithm
  3. Build Your Own Programming Language Part III: Improving Lexical Analysis with Regex Lookaheads
  4. Building Your Own Programming Language From Scratch Part IV: Implementing Functions
  5. Building Your Own Programming Language From Scratch: Part V - Arrays

The full source code is available over on GitHub.

Our language will provide the following types of loops:

  • For loop - to iterate through a range:

  • While loop - to iterate as long as a specified condition is true:

  • For-each (Iterable) loop - to iterate through elements of arrays and structures:

1 Lexical analysis

In the first section, we will cover lexical analysis. It’s a process to divide the source code into language lexemes, such as keyword, identifier/variable, operator, etc. To make our lexical analysis work with loop constructions, we add the following Keyword lexemes: loop, in, by to the TokenType enum:

package org.example.toylanguage.token;

...
public enum TokenType {
    ...
        Keyword("(if|elif|else|then|end|print|input|struct|fun|return|loop|in|by)(?=\\s|$)"),
    ...
}

Then, in order to separate the lower and upper bounds of the For loop range, we add the two-dot GroupDivider lexeme: [.]{2}

...
public enum TokenType {
    ...
    GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"),
    ...
}

And finally, if we don’t want the Numeric lexeme to catch the two-dot GroupDivider lexeme declared after the loop’s lower bound, we add a negative look ahead construction before the dot expression: (?![.]{2})

...
public enum TokenType {
    ...
    Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"),
    ...
}

2 Syntax analysis

Let’s move to the next section - the syntax analysis, which will convert the lexemes received from the previous step into the final statements following our language rules. As was said at the beginning, our language will support three types of loops: For, While and Iterable. Therefore, first we will declare the AbstractLoopStatement class, which will contain common execution process for all the loop types:

package org.example.toylanguage.statement.loop;

public abstract class AbstractLoopStatement extends CompositeStatement {
    protected abstract void init();

    protected abstract boolean hasNext();

    protected abstract void preIncrement();

    protected abstract void postIncrement();
}

Inside this class, we declare the following abstract methods, which can have different implementation for each loop type:

  1. init() - it will be invoked to perform initialization operations, like initializing a counter variable when a loop starts the execution.
  2. hasNext() - It will be called before each iteration to define that we still have remaining items to iterate.
  3. preIncrement() - It’s a prefix operation that will be called before each iteration.
  4. postIncrement() - It’s a postfix operation that will be called at the end of each iteration.

Before creating the loop implementations, we complete the Statement#execute() method by utilizing declared abstract methods. To start the execution, we invoke the init() method to initialize the loop state, but before that, we should open the new MemoryScope for the variables declared within the loop, as they shouldn’t be accessed after the loop ends. In the end of the execution, we release the earlier opened MemoryScope:

MemoryContext.newScope();

try {
	init();
} finally {
	MemoryContext.endScope();
}

Next, to iterate a loop, we utilize the abstract hasNext() method. Inside each iteration, we invoke the loop’s inner statements, defined within the inherited CompositeStatement class. When the iteration starts, we invoke the preIncrement() method to perform the prefix increment operations. At the end of each iteration, we call the postIncrement() method to perform the postfix increment operations accordingly:

while (hasNext()) {
	preIncrement();

	try {

		// execute inner statements
		for (Statement statement : getStatements2Execute()) {
			statement.execute();
		}
	} finally {
		postIncrement();
	}
}

Next, to make sure that variables declared inside the loop statements are isolated from each iteration, we open the inner MemoryScope during the statements' execution. And finally, if our loop is defined inside the function and one of the loop’s inner statements contains a return statement, we should stop the iteration immediately because it’s a sign to exit a function. Therefore, we validate that the function's return statement hasn’t been called after each statement execution. The complete AbstractLoopStatement#execute() method:

public abstract class AbstractLoopStatement extends CompositeStatement {
    ...

    @Override
    public void execute() {
        MemoryContext.newScope();
        try {

            init();

            while (hasNext()) {
                preIncrement();

                MemoryContext.newScope();

                try {

                    // execute inner statements
                    for (Statement statement : getStatements2Execute()) {
                        statement.execute();

                        if (ReturnContext.getScope().isInvoked())
                            return;

                    }
                } finally {
                    MemoryContext.endScope(); // release each iteration memory

                    
                    postIncrement();
                }

            }
        } finally {
            MemoryContext.endScope(); // release loop memory
        }
    }
}

2.1 For loop

Let’s begin with the first For loop implementation. By our language design, it will contain the variable expression, the lower bound, the upper bound, and the step expression:

package org.example.toylanguage.statement.loop;

@RequiredArgsConstructor
public class ForLoopStatement extends AbstractLoopStatement {
    private final VariableExpression variable;
    private final Expression lowerBound;
    private final Expression uppedBound;
    private final Expression step;
}

Next, we implement the abstract methods. Within the first init() method, we assign the lower bound to the variable by evaluating an instance of AssignmentOperator, which accepts the variable expression as a first operand and the lowerBound expression as a second operand:

...
public class ForLoopStatement extends AbstractLoopStatement {
	...

	@Override
	protected void init() {
		new AssignmentOperator(variable, lowerBound)
				.evaluate();
	}
}

Next, we override the hasNext(). It should return a boolean value if we remaining have items. To make a compare operation, we create an instance of the LessThenOperator by passing the variable expression as a first operand, and the upperBound as a second operand. As a result, we expect to get the LogicalValue with boolean value inside it:

...
public class ForLoopStatement extends AbstractLoopStatement {
    ...

	@Override
	protected boolean hasNext() {
		LessThanOperator hasNext = new LessThanOperator(variable, uppedBound);
		Value<?> value = hasNext.evaluate();
		return value instanceof LogicalValue && ((LogicalValue) value).getValue();
	}
}

Lastly, to perform the increment operations for each iteration, we need to implement the preIncrement() and postIncrement() methods. The For loop should use the step increment only at the end of each iteration. Therefore, we perform the increment only within the postIncrement() method. To implement the increment operation, we create an instance of the AdditionOperator, which will sum up the variable value with the step value, and assign the result to the variable expression again using an instance of the AssignmentOperator. If the step expression hasn’t been provided, we accumulate the NumericValue instance with the default value equal to 1.0 :

...
public class ForLoopStatement extends AbstractLoopStatement {
	...

	@Override
	protected void preIncrement() {
	}

	@Override
	protected void postIncrement() {
		AdditionOperator stepOperator = new AdditionOperator(variable, Objects.requireNonNullElseGet(step, () -> new NumericValue(1.0)));
		new AssignmentOperator(variable, stepOperator)
				.evaluate();
	}
}

2.2 While loop

The next loop implementation is the While loop. It’s more straightforward than the For loop, as we don’t need to perform addition and assignment operations at all. This operator will contain only the hasNext expression, which will only serve to calculate the hasNext() result. The other overridden methods will be empty:

package org.example.toylanguage.statement.loop;

@RequiredArgsConstructor
public class WhileLoopStatement extends AbstractLoopStatement {
    private final Expression hasNext;

    @Override
    protected void init() {
    }

    @Override
    protected boolean hasNext() {
        Value<?> value = hasNext.evaluate();
        return value instanceof LogicalValue && ((LogicalValue) value).getValue();
    }

    @Override
    protected void preIncrement() {
    }

    @Override
    protected void postIncrement() {
    }
}

2.3 Iterable loop

The last loop implementation is the Iterable loop. It will allow us to perform the for-each iterations with arrays and structures.

First, we define the new Value implementation, which will override the java.util.Iterable interface and, accordingly, will provide us with the java.util.Iterator implementation:

package org.example.toylanguage.expression.value;

public abstract class IterableValue<T> extends Value<T> implements Iterable<Value<?>> {
    public IterableValue(T value) {
        super(value);
    }
}

Next, we extend the ArrayValue by implementing IterableValue and passing array values to the Iterator instance:

package org.example.toylanguage.expression.value;

public class ArrayValue extends IterableValue<List<Value<?>>> {
    ...

    @Override
    public Iterator<Value<?>> iterator() {
        return getValue().iterator();
    }
}

And we do the same extension for StructureValue as well. For now, we will iterate only the structure’s values. Later, we can introduce the dictionary data type to operate with key-value types:

package org.example.toylanguage.expression.value;

public class StructureValue extends IterableValue<Map<String, Value<?>>> {
    ...

    @Override
    public Iterator<Value<?>> iterator() {
        return getValue().values().iterator();
    }
}

Now we have the two IterableValue implementations which can provide us with the Iterator instance, we can complete the Iterable loop with the corresponding implementation. It will contain the variable expression and the iterable expression. Also, we declare an auxiliary non-final Iterator<Value<?> field that will be used to iterate the loop:

package org.example.toylanguage.expression.value;

@RequiredArgsConstructor
public class IterableLoopStatement extends AbstractLoopStatement {
    private final VariableExpression variable;
    private final Expression iterableExpression;

    private Iterator<Value<?>> iterator;
}

Next, we override and resolve the abstract methods. Inside the init() method, we initialize the iterator field using the IterableValue’s iterator() method:

...
public class IterableLoopStatement extends AbstractLoopStatement {
    ...

	@Override
	protected void init() {
		Value<?> value = iterableExpression.evaluate();
		if (!(value instanceof IterableValue))
			throw new ExecutionException(String.format("Unable to loop non IterableValue `%s`", value));
		this.iterator = ((IterableValue<?>) value).iterator();
	}
}

The hasNext() method will utilize the Iterator#hasNext() implementation. Lastly, we should perform the prefix increment operation to know the value that is currently being iterated. Within preIncrement(), we create an instance of the AssignmentOperator, which will accept the variable as a first operand, and the value received from the iterator as a second operand:

...
public class IterableLoopStatement extends AbstractLoopStatement {
    ...

	@Override
	protected boolean hasNext() {
		return iterator.hasNext();
	}

	@Override
	protected void preIncrement() {
		Value<?> next = iterator.next();
		new AssignmentOperator(variable, next)
				.evaluate();
	}

	@Override
	protected void postIncrement() {
	}

}

2.4 StatementParser

To make the loop statements work, we need to map the loop lexemes to the defined statements using the StatementParser class. All the loops defined by the language grammar start with the loop keyword. Therefore, inside the parseExpression() method, we add a new loop case for the corresponding Keyword lexeme:

package org.example.toylanguage;

public class StatementParser {
	...

	private Statement parseExpression() {
		Token token = tokens.next(TokenType.Keyword, TokenType.Variable, TokenType.Operator);
		switch (token.getType()) {
			case Variable:
			case Operator:
                ...
			case Keyword:
				switch (token.getValue()) {
					case "print":
                        ...
					case "input":
                        ...
					case "if":
                        ...
					case "struct":
                        ...
					case "fun":
                        ...
					case "return":
                        ...
					case "loop":
                        ...
				}
			default:
                ...
		}
	}
}

After the loop Keyword, we read the next expression using the Dijkstra Two-Stack algorithm defined within the ExpressionReader class. If we read the For or the Iterable loop, we expect this expression to be only a variable. If we read the While loop, this expression will act as a loop condition, which can be an instance of any Expression implementation, including operator or function invocation. Therefore, we split the execution. In case the expression is the VariableExpression instance and the next lexeme is the in Keyword, we continue to read the For and Iterable loop. In the negative case, we build the WhileLoopStatement using the condition expression:

case "loop": {
    Expression loopExpression = new ExpressionReader().readExpression();

    AbstractLoopStatement loopStatement;
    if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
        // loop <variable> in <bounds>
    } else {
        // loop <condition>
        loopStatement = new WhileLoopStatement(loopExpression);
    }

Inside the For and Iterable loop clause, we skip the next in Keyword lexeme, and then read the following bounds expression. This expression can be an instance of any Expression implementation which can be evaluated to the lower bound value or IterableValue only when the code will be executed. To define which loop type it is, we can rely on the next lexeme. In the case of the For loop, after the lower bound expression, we expect to read the two-dot GroupDivider as a spliterator for lower and upper bounds. In the negative condition, we build the IterableLoopStatement:

if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {

    // loop <variable> in <bounds>
    VariableExpression variable = (VariableExpression) loopExpression;
    tokens.next(TokenType.Keyword, "in");
    Expression bounds = new ExpressionReader().readExpression();

    if (tokens.peek(TokenType.GroupDivider, "..")) {
        // loop <variable> in <lower_bound>..<upper_bound>
    } else {
        // loop <variable> in <iterable>
        loopStatement = new IterableLoopStatement(variable, bounds);
    }
}

To build the last For loop statement, we skip the two-dot GroupDivider and read the upper bound expression. In our language grammar, for the For loop, we can define the step expression with the by keyword. Therefore, if the next expression is the by lexeme, we read the following step expression. At the end, we build an instance of the ForLoopStatement:

if (tokens.peek(TokenType.GroupDivider, "..")) {
    // loop <variable> in <lower_bound>..<upper_bound>
    tokens.next(TokenType.GroupDivider, "..");
    Expression upperBound = new ExpressionReader().readExpression();

    Expression step = null;
    if (tokens.peek(TokenType.Keyword, "by")) {
        // loop <variable> in <lower_bound>..<upper_bound> by <step>
        tokens.next(TokenType.Keyword, "by");
        step = new ExpressionReader().readExpression();
    }

    loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);
}

We have read every loop type declaration. Finally, we can read the body of the loop. To gather all internal statements, we invoke the recursive StatementParser#parseExpression() method, which will return an instance of the Statement interface. We should stop the process when we reach the finalizing end lexeme:

while (!tokens.peek(TokenType.Keyword, "end")) {
    Statement statement = parseExpression();
    loopStatement.addStatement(statement);
}

At the end, we skip this end keyword and return the loopStatement. The complete loop clause:

case "loop": {
    Expression loopExpression = new ExpressionReader().readExpression();
    AbstractLoopStatement loopStatement;

    if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
        // loop <variable> in <bounds>
        VariableExpression variable = (VariableExpression) loopExpression;
        tokens.next(TokenType.Keyword, "in");
        Expression bounds = new ExpressionReader().readExpression();

        if (tokens.peek(TokenType.GroupDivider, "..")) {
            // loop <variable> in <lower_bound>..<upper_bound>
            tokens.next(TokenType.GroupDivider, "..");
            Expression upperBound = new ExpressionReader().readExpression();

            Expression step = null;
            if (tokens.peek(TokenType.Keyword, "by")) {
                // loop <variable> in <lower_bound>..<upper_bound> by <step>
                tokens.next(TokenType.Keyword, "by");
                step = new ExpressionReader().readExpression();
            }

            loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);

        } else {
            // loop <variable> in <iterable>
            loopStatement = new IterableLoopStatement(variable, bounds);
        }

    } else {
        // loop <condition>
        loopStatement = new WhileLoopStatement(loopExpression);
    }

    while (!tokens.peek(TokenType.Keyword, "end")) {
        Statement statement = parseExpression();
        loopStatement.addStatement(statement);
    }

    tokens.next(TokenType.Keyword, "end");

    return loopStatement;
}

3 Tests

We have finished embedding loops in both lexical and syntax analyzers. Now we should be able to perform For, While and Iterable loops. Let’s create and test a more real task - the bubble sort algorithm:

fun bubble_sort [ arr, n ]
    loop i in 0..n - 1
        loop j in 0..n - i - 1
            if arr{j+1} < arr{j} then
                temp = arr{j}
                arr{j} = arr{j + 1}
                arr{j + 1} = temp
            end
       end
    end
end

fun is_sorted [ arr, n ]
    loop i in 0..n - 1
        if arr{i} > arr{i+1} then
            return false
        end
    end
    return true
end

arr = {}
arr_len = 20
loop i in 0..arr_len by 1
    arr << i * 117 % 17 - 1
end

print arr # [-1, 14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1, -1, 14, 12]
print is_sorted [ arr, arr_len ] # false

bubble_sort [ arr, arr_len ]
print arr # [-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 15]
print is_sorted [ arr, arr_len ] # true


Written by alexmakeev1995 | Senior SWE at Layermark
Published by HackerNoon on 2022/07/20