

Building Your Own Programming Language From Scratch: Part VI - Loops
source link: https://hackernoon.com/building-your-own-programming-language-from-scratch-part-vi-loops
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this part of creating your programming language, we will implement loops.
Please see the previous parts:
Our language will provide the following types of loops:
- For loop - to iterate through a range:
- While loop - to iterate as long as a specified condition is true:
- For-each (Iterable) loop - to iterate through elements of arrays and structures:
1 Lexical analysis
In the first section, we will cover lexical analysis. It’s a process to divide the source code into language lexemes, such as keyword, identifier/variable, operator, etc. To make our lexical analysis work with loop constructions, we add the following Keyword
lexemes: loop
, in
, by
to the TokenType enum:
package org.example.toylanguage.token;
...
public enum TokenType {
...
Keyword("(if|elif|else|then|end|print|input|struct|fun|return|loop|in|by)(?=\\s|$)"),
...
}
Then, in order to separate the lower and upper bounds of the For loop range, we add the two-dot GroupDivider
lexeme: [.]{2}
...
public enum TokenType {
...
GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"),
...
}
And finally, if we don’t want the Numeric
lexeme to catch the two-dot GroupDivider
lexeme declared after the loop’s lower bound, we add a negative look ahead construction before the dot expression: (?![.]{2})
...
public enum TokenType {
...
Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"),
...
}
2 Syntax analysis
Let’s move to the next section - the syntax analysis, which will convert the lexemes received from the previous step into the final statements following our language rules. As was said at the beginning, our language will support three types of loops: For, While and Iterable. Therefore, first we will declare the AbstractLoopStatement
class, which will contain common execution process for all the loop types:
package org.example.toylanguage.statement.loop;
public abstract class AbstractLoopStatement extends CompositeStatement {
protected abstract void init();
protected abstract boolean hasNext();
protected abstract void preIncrement();
protected abstract void postIncrement();
}
Inside this class, we declare the following abstract methods, which can have different implementation for each loop type:
init()
- it will be invoked to perform initialization operations, like initializing a counter variable when a loop starts the execution.hasNext()
- It will be called before each iteration to define that we still have remaining items to iterate.preIncrement()
- It’s a prefix operation that will be called before each iteration.postIncrement()
- It’s a postfix operation that will be called at the end of each iteration.
Before creating the loop implementations, we complete the Statement#execute()
method by utilizing declared abstract methods. To start the execution, we invoke the init()
method to initialize the loop state, but before that, we should open the new MemoryScope for the variables declared within the loop, as they shouldn’t be accessed after the loop ends. In the end of the execution, we release the earlier opened MemoryScope:
MemoryContext.newScope();
try {
init();
} finally {
MemoryContext.endScope();
}
Next, to iterate a loop, we utilize the abstract hasNext()
method. Inside each iteration, we invoke the loop’s inner statements, defined within the inherited CompositeStatement class. When the iteration starts, we invoke the preIncrement()
method to perform the prefix increment operations. At the end of each iteration, we call the postIncrement()
method to perform the postfix increment operations accordingly:
while (hasNext()) {
preIncrement();
try {
// execute inner statements
for (Statement statement : getStatements2Execute()) {
statement.execute();
}
} finally {
postIncrement();
}
}
Next, to make sure that variables declared inside the loop statements are isolated from each iteration, we open the inner MemoryScope during the statements' execution. And finally, if our loop is defined inside the function and one of the loop’s inner statements contains a return statement, we should stop the iteration immediately because it’s a sign to exit a function. Therefore, we validate that the function's return statement hasn’t been called after each statement execution. The complete AbstractLoopStatement#execute()
method:
public abstract class AbstractLoopStatement extends CompositeStatement {
...
@Override
public void execute() {
MemoryContext.newScope();
try {
init();
while (hasNext()) {
preIncrement();
MemoryContext.newScope();
try {
// execute inner statements
for (Statement statement : getStatements2Execute()) {
statement.execute();
if (ReturnContext.getScope().isInvoked())
return;
}
} finally {
MemoryContext.endScope(); // release each iteration memory
postIncrement();
}
}
} finally {
MemoryContext.endScope(); // release loop memory
}
}
}
2.1 For loop
Let’s begin with the first For loop implementation. By our language design, it will contain the variable expression, the lower bound, the upper bound, and the step expression:
package org.example.toylanguage.statement.loop;
@RequiredArgsConstructor
public class ForLoopStatement extends AbstractLoopStatement {
private final VariableExpression variable;
private final Expression lowerBound;
private final Expression uppedBound;
private final Expression step;
}
Next, we implement the abstract methods. Within the first init()
method, we assign the lower bound to the variable by evaluating an instance of AssignmentOperator, which accepts the variable
expression as a first operand and the lowerBound
expression as a second operand:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected void init() {
new AssignmentOperator(variable, lowerBound)
.evaluate();
}
}
Next, we override the hasNext()
. It should return a boolean value if we remaining have items. To make a compare operation, we create an instance of the LessThenOperator by passing the variable
expression as a first operand, and the upperBound
as a second operand. As a result, we expect to get the LogicalValue with boolean value inside it:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected boolean hasNext() {
LessThanOperator hasNext = new LessThanOperator(variable, uppedBound);
Value<?> value = hasNext.evaluate();
return value instanceof LogicalValue && ((LogicalValue) value).getValue();
}
}
Lastly, to perform the increment operations for each iteration, we need to implement the preIncrement()
and postIncrement()
methods. The For loop should use the step increment only at the end of each iteration. Therefore, we perform the increment only within the postIncrement()
method. To implement the increment operation, we create an instance of the AdditionOperator, which will sum up the variable
value with the step
value, and assign the result to the variable
expression again using an instance of the AssignmentOperator. If the step expression hasn’t been provided, we accumulate the NumericValue instance with the default value equal to 1.0
:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected void preIncrement() {
}
@Override
protected void postIncrement() {
AdditionOperator stepOperator = new AdditionOperator(variable, Objects.requireNonNullElseGet(step, () -> new NumericValue(1.0)));
new AssignmentOperator(variable, stepOperator)
.evaluate();
}
}
2.2 While loop
The next loop implementation is the While loop. It’s more straightforward than the For loop, as we don’t need to perform addition and assignment operations at all. This operator will contain only the hasNext
expression, which will only serve to calculate the hasNext()
result. The other overridden methods will be empty:
package org.example.toylanguage.statement.loop;
@RequiredArgsConstructor
public class WhileLoopStatement extends AbstractLoopStatement {
private final Expression hasNext;
@Override
protected void init() {
}
@Override
protected boolean hasNext() {
Value<?> value = hasNext.evaluate();
return value instanceof LogicalValue && ((LogicalValue) value).getValue();
}
@Override
protected void preIncrement() {
}
@Override
protected void postIncrement() {
}
}
2.3 Iterable loop
The last loop implementation is the Iterable loop. It will allow us to perform the for-each iterations with arrays and structures.
First, we define the new Value implementation, which will override the java.util.Iterable
interface and, accordingly, will provide us with the java.util.Iterator
implementation:
package org.example.toylanguage.expression.value;
public abstract class IterableValue<T> extends Value<T> implements Iterable<Value<?>> {
public IterableValue(T value) {
super(value);
}
}
Next, we extend the ArrayValue by implementing IterableValue and passing array values to the Iterator instance:
package org.example.toylanguage.expression.value;
public class ArrayValue extends IterableValue<List<Value<?>>> {
...
@Override
public Iterator<Value<?>> iterator() {
return getValue().iterator();
}
}
And we do the same extension for StructureValue as well. For now, we will iterate only the structure’s values. Later, we can introduce the dictionary data type to operate with key-value types:
package org.example.toylanguage.expression.value;
public class StructureValue extends IterableValue<Map<String, Value<?>>> {
...
@Override
public Iterator<Value<?>> iterator() {
return getValue().values().iterator();
}
}
Now we have the two IterableValue implementations which can provide us with the Iterator instance, we can complete the Iterable loop with the corresponding implementation. It will contain the variable
expression and the iterable
expression. Also, we declare an auxiliary non-final Iterator<Value<?>
field that will be used to iterate the loop:
package org.example.toylanguage.expression.value;
@RequiredArgsConstructor
public class IterableLoopStatement extends AbstractLoopStatement {
private final VariableExpression variable;
private final Expression iterableExpression;
private Iterator<Value<?>> iterator;
}
Next, we override and resolve the abstract methods. Inside the init()
method, we initialize the iterator
field using the IterableValue’s iterator()
method:
...
public class IterableLoopStatement extends AbstractLoopStatement {
...
@Override
protected void init() {
Value<?> value = iterableExpression.evaluate();
if (!(value instanceof IterableValue))
throw new ExecutionException(String.format("Unable to loop non IterableValue `%s`", value));
this.iterator = ((IterableValue<?>) value).iterator();
}
}
The hasNext()
method will utilize the Iterator#hasNext()
implementation. Lastly, we should perform the prefix increment operation to know the value that is currently being iterated. Within preIncrement()
, we create an instance of the AssignmentOperator, which will accept the variable
as a first operand, and the value received from the iterator as a second operand:
...
public class IterableLoopStatement extends AbstractLoopStatement {
...
@Override
protected boolean hasNext() {
return iterator.hasNext();
}
@Override
protected void preIncrement() {
Value<?> next = iterator.next();
new AssignmentOperator(variable, next)
.evaluate();
}
@Override
protected void postIncrement() {
}
}
2.4 StatementParser
To make the loop statements work, we need to map the loop lexemes to the defined statements using the StatementParser class. All the loops defined by the language grammar start with the loop
keyword. Therefore, inside the parseExpression() method, we add a new loop
case for the corresponding Keyword lexeme:
package org.example.toylanguage;
public class StatementParser {
...
private Statement parseExpression() {
Token token = tokens.next(TokenType.Keyword, TokenType.Variable, TokenType.Operator);
switch (token.getType()) {
case Variable:
case Operator:
...
case Keyword:
switch (token.getValue()) {
case "print":
...
case "input":
...
case "if":
...
case "struct":
...
case "fun":
...
case "return":
...
case "loop":
...
}
default:
...
}
}
}
After the loop
Keyword, we read the next expression using the Dijkstra Two-Stack algorithm defined within the ExpressionReader class. If we read the For or the Iterable loop, we expect this expression to be only a variable. If we read the While loop, this expression will act as a loop condition, which can be an instance of any Expression implementation, including operator or function invocation. Therefore, we split the execution. In case the expression is the VariableExpression instance and the next lexeme is the in
Keyword, we continue to read the For and Iterable loop. In the negative case, we build the WhileLoopStatement using the condition expression:
case "loop": {
Expression loopExpression = new ExpressionReader().readExpression();
AbstractLoopStatement loopStatement;
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
} else {
// loop <condition>
loopStatement = new WhileLoopStatement(loopExpression);
}
Inside the For and Iterable loop clause, we skip the next in
Keyword lexeme, and then read the following bounds expression. This expression can be an instance of any Expression implementation which can be evaluated to the lower bound value or IterableValue only when the code will be executed. To define which loop type it is, we can rely on the next lexeme. In the case of the For loop, after the lower bound expression, we expect to read the two-dot GroupDivider as a spliterator for lower and upper bounds. In the negative condition, we build the IterableLoopStatement:
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
VariableExpression variable = (VariableExpression) loopExpression;
tokens.next(TokenType.Keyword, "in");
Expression bounds = new ExpressionReader().readExpression();
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
} else {
// loop <variable> in <iterable>
loopStatement = new IterableLoopStatement(variable, bounds);
}
}
To build the last For loop statement, we skip the two-dot GroupDivider and read the upper bound expression. In our language grammar, for the For loop, we can define the step expression with the by
keyword. Therefore, if the next expression is the by
lexeme, we read the following step expression. At the end, we build an instance of the ForLoopStatement:
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
tokens.next(TokenType.GroupDivider, "..");
Expression upperBound = new ExpressionReader().readExpression();
Expression step = null;
if (tokens.peek(TokenType.Keyword, "by")) {
// loop <variable> in <lower_bound>..<upper_bound> by <step>
tokens.next(TokenType.Keyword, "by");
step = new ExpressionReader().readExpression();
}
loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);
}
We have read every loop type declaration. Finally, we can read the body of the loop. To gather all internal statements, we invoke the recursive StatementParser#parseExpression()
method, which will return an instance of the Statement interface. We should stop the process when we reach the finalizing end
lexeme:
while (!tokens.peek(TokenType.Keyword, "end")) {
Statement statement = parseExpression();
loopStatement.addStatement(statement);
}
At the end, we skip this end
keyword and return the loopStatement. The complete loop
clause:
case "loop": {
Expression loopExpression = new ExpressionReader().readExpression();
AbstractLoopStatement loopStatement;
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
VariableExpression variable = (VariableExpression) loopExpression;
tokens.next(TokenType.Keyword, "in");
Expression bounds = new ExpressionReader().readExpression();
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
tokens.next(TokenType.GroupDivider, "..");
Expression upperBound = new ExpressionReader().readExpression();
Expression step = null;
if (tokens.peek(TokenType.Keyword, "by")) {
// loop <variable> in <lower_bound>..<upper_bound> by <step>
tokens.next(TokenType.Keyword, "by");
step = new ExpressionReader().readExpression();
}
loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);
} else {
// loop <variable> in <iterable>
loopStatement = new IterableLoopStatement(variable, bounds);
}
} else {
// loop <condition>
loopStatement = new WhileLoopStatement(loopExpression);
}
while (!tokens.peek(TokenType.Keyword, "end")) {
Statement statement = parseExpression();
loopStatement.addStatement(statement);
}
tokens.next(TokenType.Keyword, "end");
return loopStatement;
}
3 Tests
We have finished embedding loops in both lexical and syntax analyzers. Now we should be able to perform For, While and Iterable loops. Let’s create and test a more real task - the bubble sort algorithm:
fun bubble_sort [ arr, n ]
loop i in 0..n - 1
loop j in 0..n - i - 1
if arr{j+1} < arr{j} then
temp = arr{j}
arr{j} = arr{j + 1}
arr{j + 1} = temp
end
end
end
end
fun is_sorted [ arr, n ]
loop i in 0..n - 1
if arr{i} > arr{i+1} then
return false
end
end
return true
end
arr = {}
arr_len = 20
loop i in 0..arr_len by 1
arr << i * 117 % 17 - 1
end
print arr # [-1, 14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1, -1, 14, 12]
print is_sorted [ arr, arr_len ] # false
bubble_sort [ arr, arr_len ]
print arr # [-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 15]
print is_sorted [ arr, arr_len ] # true
Recommend
-
9
How to design a new programming language from scratch December 25, 2020 on Drew DeVault's blog There is a long, difficult road from vague, pie-in-the-sky ideas abo...
-
14
Jamar Ramos offers lessons learned after building one business that failed and using that experience to create a successful digital marketing agency. Jamar Ram...
-
6
Building your Chat App From Scratch vs Integrating with a Chat APIFebruary 11th 2022 new story4
-
3
@alexmakeev1995Alexander Makeev6+ years Java developer
-
12
In this part of creating your own programming language, we will introduce new function constructions. Please checkout the previous parts:The full source code is available over on
-
11
A Linux Kernel Module written in Scratch (a visual programming language for kids)Seriously. Someone did this. Because... why not?Ready to have your mind blown… just a little? Check this out:
-
6
Android From Scratch: Building Your First Android Application Have you ever wanted to build an Android app but couldn't figure out where to start? In this tutorial, I will show you how to use An...
-
6
Building Your Own Programming Language From Scratch: Part V - ArraysJuly 15th 2022 new story12...
-
5
This article was published as a part of the Data Science Blogathon. Introduction In this article, we will explore one of Microsoft’s proprietary products, “P...
-
4
In the field of programming, Iteration Statements are really helpful when we need a specific task in repetition. They’re essential as they reduce hours of work to seconds. In this article, we will explore the basics of loop...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK