Understanding RISC-V Assembly Language by Building an Assembler in C# - Part 2

Written by rizwan3d | Published 2023/12/03
Tech Story Tags: risc-v | c-sharp | assembly | sharprisvc | risc-processor-architecture | risc-v-board | open-source-software | label-processing-in-risc-v

TLDRIn this article, we focus on the Label Processing phase and discuss the essential aspects of handling labels, calculating addresses, and dealing with offsets. Labels provide a symbolic representation of memory addresses and play a pivotal role in facilitating jumps and branches in the code. We use a simple algorithm to extract and store these labels along with their corresponding addresses.via the TL;DR App

In the exciting journey of developing a RISC-V assembler using C#, we delve into the intricate processes that bring assembly code to life. In this article, we focus on the Label Processing phase and discuss the essential aspects of handling labels, calculating addresses, and dealing with offsets.

Phase 1: Label Processing

The Label Processing phase is crucial for creating a robust assembler. Labels provide a symbolic representation of memory addresses and play a pivotal role in facilitating jumps and branches in the code. Let's break down the steps involved in processing labels.

Identifying Labels

The first step is to iterate through the assembly code and identify labels. Labels in assembly code typically end with a colon “:”. We use a simple algorithm to extract and store these labels along with their corresponding addresses.

public static void ProcessLabels(string[] code)
{
    foreach (var assemblyLine in code)
    {
        var line = assemblyLine.Trim();
        if (string.IsNullOrEmpty(line)) continue;
        if (line.StartsWith(".")) continue;
        if (line.EndsWith(":"))
        {
            string label = line.Substring(0, line.Length - 1);
            AddressLookupTable.Add(label, Address.CurrentAddress);
            continue;
        }
        Address.CurrentAddress += 4;
    }
}

Calculating Addresses

To maintain the coherence of memory addresses, we increment the address by 4 (decimal) for each instruction, excluding empty lines, labels, and directives.

In RISC (Reduced Instruction Set Computing) architectures, all instructions are of the same fixed size. Each instruction is 32 bits (4 bytes) long. By incrementing the address by 4 for each instruction, the assembler ensures that subsequent instructions are stored in consecutive memory locations. This sequential arrangement facilitates easier memory access during the execution phase, as the processor can fetch instructions efficiently without having to perform complex calculations to determine the next instruction's location.

Addressing Offsets

The concept of offsets is crucial for resolving the actual memory addresses of labels during execution. Since the assembler does not know the final memory layout, we use offsets to bridge the gap.

The offset is calculated by subtracting the current address (Address.CurrentAddress) from the label's address. This difference represents the distance between the current instruction and the target label in terms of memory locations.

The calculated offset is then utilized in the machine code generation process. It allows the assembler to generate machine code instructions that incorporate the correct memory addresses, accounting for the relative positions of labels within the code.

For example, if the current address is 28 and a label is at 10, the offset is -18. During execution at address 0x00FF28, the location of the label is 0x00FF06 (0x00FF28 - 0x12).

Machine Code Generation

During the generation of machine code, we encounter the challenge of handling labels, especially when they represent memory addresses. If an immediate value is not an integer, it implies a label reference. In such cases, we retrieve the address from the lookup table.

int labelAddress = AddressLookupTable[assemblyLine];
int offset = labelAddress - Address.CurrentAddress;

As we progress in developing the SharpRISCV assembler, understanding labels, addresses, and offsets lays a solid foundation for the subsequent phases.

The repository SharpRISCV serves as the starting point for enthusiasts eager to explore and contribute to the project.

If you find the project helpful and informative, don’t forget to give it a star on GitHub to show your support.

In the previous parts, we delved into instruction types, parsing instructions, and converting to machine code, and now, with a grasp of addressing modes, labels, and offsets, we eagerly anticipate the exploration of directives in the upcoming part. Stay tuned for the next installment in our journey through the world of RISC-V assembly and C# magic.


Written by rizwan3d | Only Code
Published by HackerNoon on 2023/12/03