How to learn an existing code base

Photo by Mikayla Mallek on Unsplash

Whether it’s your first day at job or a struggling to contribute to any project, it’s really important to understand and master the art of learning, conceptualising and improvising skills to successfully adapt to a project. Specially if you’re not familiar with the technologies, adaptation should be done carefully.

Learning an existing codebase takes time and patience. You might not be able to grasp all concepts and algorithms in few days or even in weeks. But it’s important to follow a well defined plan and work accordingly. This will ensure that you’re goal oriented and making progress. Setting up milestones help to measure the progress and make changes. This process is similar to agile methodologies used in software engineering where we iteratively check on the processes followed and make decisions along the way working towards improvements.

Familiarizing yourself with technology stack should be the priority if you don’t have any experience with the technologies used in the project. This is fairly straight forward. Following documentation of the frameworks and libraries, you should be able get a decent knowledge about the usage and their internals.

Figuring out architecture and components

In a project where it is designed to cater enterprise needs, a code base is well structured and abstracted into logical components. These components try to complement a certain architecture followed within the codebase. For an example MVC architecture structures the codebase in to three layers (Model, View, Controller). Within the layers itself design patterns might have been applied depending on the use cases.

As an engineer, figuring out the architecture by communicating with your peers is really important, since this provides a birds eye view of the whole project. Depending on the application, components are created to suit the use cases and different requirements like maintainability, scalability and security. Therefore getting to know the exact use cases and the requirements is a necessary step when diving into a new project. One of the often mistakes is that engineers assume various requirements and try to relate the abstractions in the project. This can be avoided by having a good knowledge about the requirements.

Approaches

There are three approaches when learning a codebase. Top-down approach, bottom-up approach and mixed approach.

Top-down approach

Top-down approach illustrated

Having an idea about the project through birds eye view helps immensely to get use the top-down approach. A major use case can be selected from the project. As an example, let’s assume of a scenario where a client application makes a GET HTTP request to one of the endpoints in a REST API to retrieve the health of the database.

Sample project structure

Assuming that above represents project structure, we’ll walk through the HTTP request to an endpoint. At run-time we can use a debugger to pin point the execution of code and identify different layers. According to the project monitor endpoint would execute code in routes/monitor.js. This in turn executes the logic in within controllers/monitor.js and subsequently controller executes the relevant repository code which is repository/monitor.js. Repository represents the model layer in this case. Repository code might be split up to different code blocks where the logic is differentiated. Combined result of the repository will provide the controller the result from the database and controller takes care of the logic and returns the result through HTTP response.

Following this process of stepping through the code from top to bottom line by line will provide us the execution flow of the code and an opportunity to identify the logical layers.

Bottom-up approach

Bottom-up approach illustrated

This approach involves inspecting the code from lowest level of the logical stack to the top most sections logical layers. Same as the top to bottom approach selecting a use case and trying to map that would be the ideal way to learn the bottom up approach as well. In this particular case we really don’t have a solid idea on the path to follow. Therefore this would take few trial and error attempts to find the execution flow bottom to top.

Here using basic intuitions is the best way to find the starting point and get going. In the case of monitor API example, finding the repository layer would be first step and jumping through the code to find the end of code execution flow. Figuring out the logical layers might be challenging at first. For an example helpers vs handlers, they might mean different logical abstractions based on the context of the application. Once a starting point was found it will be relatively easy to jump to the top of the execution (route endpoint in this case).

Mixed approach

Mixed approach illustrated

In practical scenarios using just one of the above mentioned approaches won’t be useful to learn fast. Mixing both approaches will yield better understanding about the codebase in much intuitive and faster manner. There’s nothing magical about mixed approach. It’s all about combining both top-bottom and bottom-up approaches. Switching between the two approaches provides a deeper and broader view about abstract concepts hidden within the code.

Unit tests and integration tests

Following the test pyramid is an integral part of the learning process. If the project already has implemented unit tests and integration tests, as a newbie you can try changing the tests and observe the results of the changes. Starting from unit tests provide a deeper understanding about the desired outcomes of the particular abstracted units. Integration tests demonstrate the overall outcomes expected set of units of abstractions through a combination of a number of units.

If tests are available in the project, best approach is to execute the different types of tests and observe the outcomes and try to map the codebase accordingly.

Testing pyramid, Photo by: https://blog.primehammer.com/test-pyramid/

Breaking stuff

In a perfect world you would be able to grasp all the concepts and algorithms baked into a codebase without tying to make any modifications. Nonetheless reality is different. Tinkering with the code and making changes is the best way to learn following the methodologies mentioned above.

As you follow the codebase try to change the code snippets and before executing have an idea about the desired outcome and the real outcome that you get after execution. This will gradually improve the knowledge about the entire codebase as a whole.

Final thoughts

Learning a new codebase is a skill. It the art of combining theoretical knowledge with the practical use case scenarios. Understanding an enterprise codebase takes time and patience. Communicating with the peers will be immensely helpful in case of blockers.