Processes & Threads: The Essentials to Mastering Linux

Have you ever heard of Moore’s Law which governed the semiconductor industry for almost 60 years? Do you know that Moore's Rule was never a law, but rather an observation? For those unfamiliar with this law, this is how it is expressed.

Moore's law observes that the number of transistors in dense integrated circuits doubles every two years.

Although the rule is written in terms of electronics, from the standpoint of a computer, it indicates that computing power will double every two years. Essentially computational power on a single chip increases exponentially. Isn’t it fantastic?

The amazing thing is that since this observation was made, it has been proven to be true and is now regarded as a rule.

However, as we are all aware, nice things are not always free. In this instance, the more powerful computation was at the expense of excessive heating. At the time they arrived in 2010, processors reached temperatures high enough to cook pasta. (Check this youtube video to experience it yourself).

Additionally, as the number of transistors on a chip rose, the physical limits of the material began to be reached. Subatomic forces began to become active at this point, making it challenging to maintain Moore's law trajectory.

It felt like the good times were over where computers were becoming ever fast. But engineers had other plans. They started using multiple powerful cores instead of a single super powerful core.

Although it may have resolved the hardware issue, it presented a fresh challenge to the software engineers. Engineers had to develop software that took advantage of the several cores that the computers now had. As a result, concurrent programming, often known as parallel programming, was born. Threads are in the spotlight in this area of programming. But before talking about threads we must understand what a process is.

What are Linux Processes?

A Linux Process is defined as a running instance of a program.

Therefore, the system must keep track of a number of items while a program is performed, including stack memory, source code, and registers, among other things. In Linux, the aggregate of all these things is referred to as a process. Linux is built on a foundation of processes.

To check the different processes running on your machine you can try running the following command. This will run all the processes along with their process ID.

$ ps

Here is a sample snapshot.

By default, the command above shows only those processes which are associated with the current user. To list all the processes we can use the command with the following options.

$ ps -aux

Here is a sample snapshot.

Applying these extra options to the ps command gives us some extra information too. Here is what the different flags meant -

a represents all users
u represents the current user
x displays process executed outside the terminal

We can also kill some processes using the kill command. Here is how it is used -

$ kill PID

Here PID is the process ID that we can get from the ps command. Each process in the Linux system will have a unique PID which is used to identify it. You can even use the command pidof to find the PID of process.

$ pidof bash

Parent-Child Relationships in Processes

When you launch a program or issue a command, a process is formed. When you execute a command from the terminal, you start a new process. Because a terminal was utilized to generate this new process, we say that the terminal process initiated the new command process. In other words, the new command process is the child of the terminal process.

Every process in the Linux ecosystem has a parent process that created it. We can use the following command to check the parent process of a process with a given PID.

$ ps -o ppid= -p PID

All the processes in Linux are, directly or indirectly, children of the process with PID 1. This is not a coincidence. The process with PID 1 is the init process and is the very first process started by the system at the time of boot. Any subsequent process is created as a child of this process.

We thus have a tree constructed out of these relationships among processes. This is called the process tree.

Processes are Heavy-Weight 🪨

Processes are extremely useful in Linux, and we couldn't live without them. But there is one disadvantage to them, or perhaps not a disadvantage at all, but just the way they function. A procedure is heavy. Data, memory, and variables are transferred whenever a new process is launched. Each process that runs the same program will have its own copy of the source code. As a result, spawning a high number of processes is not a smart idea.

However, because processes are a mechanism to service several requests at the same time, we are constrained by this unpleasant truth. We can only serve a small number of concurrent users who have a lot in common since we can only launch a restricted number of processes in our system. Consider a web server that has to serve numerous concurrent users. Creating a new process for each user is an expensive operation. As a result, we want something less expensive than the procedure. Threads come into play here.

What are Threads? 🧵

Threads are just lightweight processes. A thread shares memory with its parent process and any threads that it has created. Because of this shared memory, spawning new threads is less expensive. This provides the extra benefit of speedier thread communication and context switching. Using threads, a process may do several tasks concurrently.

In comparison to the number of processes, we can generate a significant number of threads. On multiple core machines, these threads are executed in parallel. When opposed to generating many processes or doing all tasks sequentially, this improves the overall performance of the program.

Let's attempt to start our first thread. It's important to note that we cannot start new threads with bash. The bash command can only be used to create subprocesses. So, what we'll be doing is writing a C code that launches two threads. Then, using bash, we will execute this C code as a subprocess. Then, two threads will be created by this new process.

Creating Threads in C

Let us start getting our hand to some code. Create a new file and name it threads.c. Go ahead and open it in any of your favorite IDE.

The first step is to import the required header files.

#include <pthread.h>
#include <stdio.h>

We will be creating two threads, each executing the same function but with different parameters. Let us write that function.

void* print_multiple_messages(void* ptr) {
    char* message = (char*) ptr;
    for(int i=0; i<1000; ++i) {
        printf("%s \n", message);
    }
}

As you can see, this function does nothing big. It takes a message as an input parameter and prints that a thousand times.

Let us write the main function now.

int main() {
  // Continue writing from here
}

Just like processes, threads also have IDs which is used to uniquely identify them. Create two variables to hold these IDs.

pthread_t thread1, thread2;

We will be using different messages for each thread. Create two strings (character array) to hold different messages.

char* message1 = "Thread 1";
char* message2 = "Thread 2";

The next step is to create the two threads. We will be using the pthread_create method to do that.

pthread_create(&thread1, NULL, print_multiple_messages, (void*) message1);
pthread_create(&thread2, NULL, print_multiple_messages, (void*) message2);

This will start two new threads. Let us instruct our main process to wait until the two threads have completed their work.

pthread_join(thread1, NULL);
pthread_join(thread2, NULL);

And that is it. Compile the code and execute it. You will notice that messages from two threads will be mixed up. This shows that they are being executed in parallel.

Congratulations you just created your very first thread.

So, in this article, we spoke about Threads and Processes. These are some of the most fascinating features of Linux, and mastering them is crucial. It enables the development of hardware-aware software and the efficient use of the resources at our disposal.

Here, we'll draw a conclusion to this article. We made an effort to go into enough detail to get you going, but this is not it. So keep learning more. If you enjoyed the content, you might want to comment and/or emote.

Enjoy Learning!