The prevalent perception is that Linux users benefit from and exercise privileges, however this is not the case. It's the process or executable that runs in a certain user context and exercises rights (permission to carry out to perform the privileged operations guarded by Linux kernel).

Processes have capabilities, not users.

In a Unix-like system, the traditional strategy for dealing with Process Privileges is to use a Binary Design (Privileged processes and Unprivileged processes). That is, a process can run as root and have full access to the system, or it can operate as a non-root user and not be able to execute privileged activities.

Privileged processes: All kernel security permission checks are bypassed by privileged processes. Perf events performance monitoring, for example, is completely accessible to privileged processes with no access, scope, or resource limitations.

Privileged Processes (where effective user ID is 0)

Note: (Effective User ID == 0) is referred to as superuser or root

Unprivileged processes are subjected to a comprehensive security permission check based on the credentials of the process (usually: effective UID, effective GID).

Unprivileged Processes (where effective user ID is nonzero)

Although this simple design is good for system administrators who need full access to the system to perform critical operations (installing updates, adding users, performing backups, mounting filesystems, rebooting the system, etc. ), it makes it challenging for system operators (HR, Finance, etc.) to perform their day-to-day tasks whenever they need to perform restricted operations or access files owned by other users.

But We Already Have DAC

DAC (Discretionary Access Control) is installed by default on the Linux File System (Files/Directories/Devices) to permit others to control access. Owners of files or directories have absolute ownership over who has access to their files and what activities they can perform.

When an Unprivileged Process (Effective User ID!= 0) requests access to the system, the Linux kernel conducts access control checks based on the user's privileged access.

Drawback of DAC

Users having root privileges do not adhere to the Linux Security Model.
Root users receive full access to the system by skipping all security checks in the kernel.

Split Root Permissions

Capabilities in Linux are used to provide fine-grained access to kernel resources that was previously unavailable to unprivileged processes. Instead of granting full access to the targeted process at once, the Linux kernel splits root permissions into smaller bits that can be distributed individually on a thread-by-thread basis.

The capability handbook page [1] has a comprehensive list of all available capabilities.

# A complete list of all available capabilities is present 
# in the capability manual page [1]. 
$ man capabilities 

# -------------------- #
# Alternative would be
# -------------------- #

# Capability supported by your kernel
$ cat /proc/sys/kernel/cap_last_cap
37

The Linux Privilege Model divides root privilege into 38+ capabilities which non-root users can use to execute privileged actions (like system calls or data manipulation).

Processes and files can have privileges enabled.
Each privilege operation is always verified against relevant capabilities (not on EUID == 0).
With UID 0, all capabilities are enabled by default. For all privileged operations, the kernel must check whether the thread has the required capability in its effective set.
The kernel must provide system calls for modifying and returning a thread's capability sets.
The filesystem must support attaching capabilities to an executable file so that a process gains those capabilities when the file is executed.

Process Capabilities Sets

There are five different capability sets that can be enabled to each process(thread) and each is represented by a 64-bit number and can have zero or more capabilities.

Effective Capabilities Set

The Effective set helps the kernel to know final permissions of a process.

When a process attempts a privileged operation, the kernel verifies that the relevant bit in the effective set is set. When a process requests to set the monoatomic clock, for example, the kernel first verifies that the CAP_SYS_TIME bit in the process effective set is set.

Permitted Capabilities Set

The Permitted set indicates what capabilities a process can use and limits what can be in effective set.

A process can have capabilities that are set in the "permitted set" but not in the effective set. This indicates that the process has temporarily disabled this capability. A process can only set its effective set bit if it is included in the permitted set.

Inheritable Capabilities Set

The inheritable capabilities are the capabilities of the current process that should be inherited by a program executed by the current process.

The permitted set of a process is masked against the inheritable set during exec(), while child processes and threads are given an exact copy of the capabilities of the parent process. Also note that ‘inheriting’ a capability does not necessarily automatically give any thread effective capabilities. ‘inherited’ capabilities only directly influence the new thread permitted set.

Bounding Capabilities Set

It is possible to limit the capabilities that a process may ever obtain using "bounding set."

Only capabilities found in the bounding set will be permitted in the inheritable and permitted sets. It is used to limit a program's capabilities. You cannot have any capability in other capability sets unless it is present in the bounding set.

Ambient Capabilities Set

The ambient capability set is applied to all non-SUID binaries that do not have file capabilities.

The ambient capability are retained capabilities during execve(). However, not all capabilities in the ambient set may be kept since they are dropped if they are not included in either the inheritable or permitted capability set.

View Process Capabilities

1- The proc filesystem (procfs)

To see the capabilities of a particular process, use the status file in the /proc/<PID>/ directory.

Process capabilities are expressed in hexadecimal format.

CapInh = Inherited capabilities

CapPrm = Permitted capabilities

CapEff = Effective capabilities

CapBnd = Bounding set

CapAmb = Ambient capabilities set

Let's have a look at the Ping utility's process capabilities. You could be wondering why effective capabilities are set to zeroes. The simplest answer would be that ping is a Capability Aware Application, which means it may drop some or all effective capabilities once they're no longer be required to reduce exposure. It can still reinstate a capability to Effective Capabilities Sets as long as it has a capability in Permitted Capabilities Sets.

# Mute the output and get process id
~$ ping 127.0.0.1 > /dev/null &
[1] 21002
~$ cat /proc/21002/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

2- Use getpcaps command

An alternative would be to use getpcaps utility to display the capabilities of a particular process.

getpcaps resolves capabilities into proper names

# suppress the output and get process id
~$ ping 127.0.0.1 > /dev/null &
[1] 21002
~$ getpcaps 21002 
Capabilities for `21002': = cap_net_admin,cap_net_raw+p

3- Use pscap Utility

Similarly, using pscap utility, we can generate a report of all running processes' capabilities.

$ pscap -a
ppid  pid   name        command           capabilities
0     1     root        systemd           full
1     419   root        systemd-journal   chown, dac_override, dac_read_search, fowner, setgid, setuid, sys_ptrace, sys_admin, audit_control, mac_override, syslog, audit_read
1     447   root        lvmetad           full
1     457   root        systemd-udevd     full
1     589   systemd-timesync  systemd-timesyn   sys_time

Decode Process Capabilities

capsh utility decodes a capability value represented in hexadecimal into the capability name.

The proc filesystem (procfs) lists process capabilities in hexadecimal format.

~$ cat /proc/21002/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
# Decode raw capabilities
~$ capsh --decode=0000000000003000
0x0000000000003000=cap_net_admin,cap_net_raw
~$ capsh --decode=0000000000003000
0x0000001fffffffff=cap_chown,cap_dac_override,
cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,
cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,
cap_net_bind_service,cap_net_broadcast,cap_net_admin,
cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,
cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,
cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,
cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,
cap_audit_write,cap_audit_control,cap_setfcap,
cap_mac_override,cap_mac_admin,cap_syslog,35,36

# -------------------- #
# Alternative would be
# -------------------- #

~$ for line in $(grep Cap /proc/21002/status | awk '{print $2}'); do capsh --decode=$line; done;
0x0000000000000000=
0x0000000000003000=cap_net_admin,cap_net_raw
0x0000000000000000=
0x0000001fffffffff=cap_chown,cap_dac_override,
cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,
cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,
cap_net_bind_service,cap_net_broadcast,cap_net_admin,
cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,
cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,
cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,
cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,
cap_audit_write,cap_audit_control,cap_setfcap,
cap_mac_override,cap_mac_admin,cap_syslog,35,36
0x0000000000000000=

Drop Process Capabilities

The capsh utility can be used to drop a capability by passing either --drop or --uid.

UID argument causes the thread to lose all capabilities.


~$ sudo capsh --caps="cap_setpcap,cap_setuid,cap_setgid+ep" \ 
--drop="cap_net_admin,cap_net_raw" --keep=1 --uid=1001 \ 
--print -- -c "ping localhost"
Current: = cap_setgid,cap_setuid,cap_setpcap+p Bounding set = 
Securebits: 020/0x10/5'b10000 
secure-noroot: no (unlocked) 
secure-no-suid-fixup: no (unlocked) 
secure-keep-caps: yes (unlocked) uid=1001(test1) gid=0(root) groups=0(root) 
ping: socket: Operation not permitted Super-powers are granted randomly so please submit an issue if you're not happy with yours.

# -------------------- #
# Alternative would be
# -------------------- #

$ sudo capsh --drop=cap_net_raw --print -- -c "/bin/ping -c 1 localhost"                                               
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=0(root)
ping: socket: Operation not permitted

File (Binaries) Capabilities

There are three distinct capability sets that can be associated with an executable file. The kernel assesses the capabilities of the new process in conjunction with the current Process Capability and File (Binaries) Capabilities.

File (Binaries) Permitted Set

These capabilities are added to the process permitted set on execution.

File (Binaries) Inheritable Set

After the execve(), the intersection (logical AND) of the thread inheritable and file inheritable sets are added to the thread permitted set.

File (Binaries) Effective Flag

In contrast to other file capability sets, it is only a flag. When the flag is set, the process effective set following execve() is set to the new process permitted set; otherwise, it is empty.

Search Binary Capabilities

Depending on the use case, we may need to look for files with capabilities enabled.

1- Use getcap Utility

To find all files with file capabilities set, use getcap -r.

A malicious user can use getcap -r find an exploitable executable binary on the system.

$ getcap -r / 2>/dev/null
/home/ubuntu/environment/cat_clone = cap_setuid+ep
/home/ubuntu/environment/top_clone = cap_chown+ep
/home/ubuntu/environment/ping_clone = cap_net_raw+p
/usr/bin/mtr-packet = cap_net_raw+ep

2- Use filecap Utility

filecap utility does similar job to list capabilities of files.

~$ filecap /usr
file                 capabilities
/usr/bin/mtr-packet     net_

3- Use pscap Utility

Similarly, we can find the capabilities set of all running processes using the pscap utility.

# pscap -a
ppid  pid   name        command           capabilities
6148  6152  root        bash              full

Set Binary Capabilities

The setcap utility adds capabilities to an executable file as permitted and effective capabilities.

Only privileged users (CAP_SETFCAP) can perform this operation.

$ setcap cap_net_raw,cap_net_admin+ep ping_clone unable to set CAP_SETFCAP effective capability: Operation not permitted

1- File Inheritable Set

Add cap_net_raw to the file inheritable set.

# Privileged ping binary
~# setcap cap_net_raw+i ping_clone
~$ getcap ping_clone
ping_clone = cap_net_raw+i

2- File Permitted Set

Add cap_net_raw, cap_net_admin to the file permitted set.

# Privileged ping binary
~# setcap cap_net_raw,cap_net_admin+p ping_clone
~$ getcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+p

3- File Effective Bit/Flag

Enabling the file effective flag causes the thread permitted set to be automatically enforced to the thread effective set.

# Privileged ping binary
~# setcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+ep
~$ getcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+ep

View Binary Capabilities

To inspect an executable file's file capabilities, use the getcap utility.

~$ getcap ping_clone
ping_clone = cap_net_raw+i

An alternative technique would be to compare the file capability set to an arbitrary value and see if it matches.

Use setcap -v to verify file capabilities.

# When it confirms file capabilities
$ setcap -v cap_net_admin,cap_net_raw+ep ping_clone                                                  
ping_clone: OK
# When file capabilities differs
$ setcap -v cap_net_raw+ep ping_clone
ping_clone differs in [pe]

In next chapter, we will see how capabilities sets are determined for Unprivileged and Privileged Program Binaries after execve(2).

Stay tuned …

What's The Big Deal With Linux Capabilities?

But We Already Have DAC

Split Root Permissions

Process Capabilities Sets

Effective Capabilities Set

Permitted Capabilities Set

Inheritable Capabilities Set

Bounding Capabilities Set

Ambient Capabilities Set

View Process Capabilities

1- The proc filesystem (procfs)

2- Use getpcaps command

3- Use pscap Utility

Decode Process Capabilities

Drop Process Capabilities

File (Binaries) Capabilities

File (Binaries) Permitted Set

File (Binaries) Inheritable Set

File (Binaries) Effective Flag

Search Binary Capabilities

1- Use getcap Utility

2- Use filecap Utility

3- Use pscap Utility

Set Binary Capabilities

1- File Inheritable Set

2- File Permitted Set

3- File Effective Bit/Flag

View Binary Capabilities