Introduction

This blog post details the exploitation process for the vulnerability CVE 2020-15999 in Google Chrome 86.0.4222.0 on Linux. While CVE 2020-15999 is a heap-based buffer overflow in the font-loading library Freetype rather than Chrome proper, its extensive use in the latter enables us to achieve code execution in the browser’s renderer. This post will not be focused on the analysis of the bug, but rather its exploitation, as extensive explanation and analysis can be found here. In essence, Truetype font files that contain bitmaps (i.e. raster images) store them in the sbix table of the font. When Freetype loads an embedded PNG image in the sbix table with dimensions exceeding the int16 limit, an integer overflow to buffer overflow (IO2BO) occurs. A PoC to achieve code execution in the renderer and pop calculator can be found in the last section of this post.

Vulnerability details

Before beginning to understand the exploitation, it is important to highlight the capabilities and limitations of the vulnerability. For this vulnerability, as we control the dimensions of the embedded PNG file, we can hence control the size of the heap buffer from which the overflow occurs. While PNG data can be written to a bitmap buffer in a nonlinear (i.e. non-consecutive writes) fashion via ADAM7 interlacing, our exploit here does not involve interlacing, making the buffer overflow essentially a linear buffer overflow from an arbitrary size buffer.

// pngshim.c
if ( bitdepth != 8                          	||
	!( color_type == PNG_COLOR_TYPE_RGB   	||
   	color_type == PNG_COLOR_TYPE_RGB_ALPHA ) )
{
  error = FT_THROW( Invalid_File_Format );
  goto DestroyExit;
}

When loading embedded PNG files, Freetype enforces that the images must be RGB or RGBA images with bit depth 8, meaning that each pixel when written as bitmap data takes up 4 bytes, hence limiting the granularity of our buffer overflow to 4 bytes.

// pngshim.c
static unsigned int
multiply_alpha( unsigned int  alpha,
          	unsigned int  color )
{
  unsigned int  temp = alpha * color + 0x80;
  return ( temp + ( temp >> 8 ) ) >> 8;
}
// Omitted...
static void
premultiply_data( png_structp	png,
            	png_row_infop  row_info,
            	png_bytep  	data )
{
  unsigned int  i = 0, limit;
  // Omitted...
  limit = row_info->rowbytes;
  for ( ; i < limit; i += 4 )
  {
    unsigned char*  base  = &data[i];
    unsigned int	alpha = base[3];
    if ( alpha == 0 )
      base[0] = base[1] = base[2] = base[3] = 0;
    else
    {
      unsigned int  red   = base[0];
      unsigned int  green = base[1];
      unsigned int  blue  = base[2];
      if ( alpha != 0xFF )
      {
        red   = multiply_alpha( alpha, red   );
        green = multiply_alpha( alpha, green );
        blue  = multiply_alpha( alpha, blue  );
      }
      base[0] = (unsigned char)blue;
      base[1] = (unsigned char)green;
      base[2] = (unsigned char)red;
      base[3] = (unsigned char)alpha;
    }
  }
}
// Omitted...
case PNG_COLOR_TYPE_RGB_ALPHA:
  	png_set_read_user_transform_fn( png, premultiply_data );
  	break;

Lastly, we discovered that Freetype applies a preprocessing step known as a premultiplied alpha on the bitmap data after it is written. During this step, each of the RGB values of a pixel is recalculated as a fraction of the alpha channel (A) value rather than an absolute value. For example, let’s say the R-value of a pixel is 17 and the A value is 30. As the maximum value for an 8-bit integer is 255, the R-value of the pixel will be recalculated as 17/255 * 30 = 2. To show with another example, the RGBA value of 0x41ffffff will be converted to 0x41414141 after this step. The implication is that our buffer overflow contents will be modified by this step such that it follows a format where the most significant byte for every 4 bytes is the largest value among the 4 bytes, hence limiting what we could write. To the best of our knowledge, there is no way to disable premultiplied alpha in Freetype.

In conclusion, with this vulnerability, we can cause a linear buffer overflow from a heap buffer of any size, with some limitations on the contents of the buffer.

TCMalloc

As with most projects coded in C, the Freetype project uses the malloc class of functions for dynamic memory management. In Chrome, calls to these functions are routed to Google’s heap implementation, known as TCMalloc. TCMalloc is one of the many heap implementations used within Chrome, and it is not used for most memory allocations (e.g. Javascript). We would hence need to either choose to corrupt Freetype’s memory structures (after all they are on the same heap) or find a different component in Chrome that also uses the TCMalloc heap. In our case, the former was not a suitable option because the buffer overflow occurs very early in the font loading process, thus not giving us a chance to create a suitable heap layout. Hence, we decided to go with the second choice of finding another target component within Chrome. The target we chose was HTML5 WebSQL, as it was easy to control heap layout through issuing SQL statements from Javascript. Note that WebSQL is implemented by embedding the open-source sqlite3 engine inside Chrome so the two terms will be used interchangeably in this post.

Before we get to the actual exploitation, it is necessary to understand the mechanism behind TCMalloc. As the official guide here does a much better job than I can at explaining, I would only do some brief explanations under the assumption that the reader has read the guide. To avoid heap noise called by frequent allocations for lower size classes, for this exploitation, we will be using chunks of size class above 0x1000. When the middle-end in TCMalloc has no more chunks to service our request, it will request for spans to break into individual chunks, thus giving us consecutive chunks from which we can overflow.

Exploitation

Outline / Summary

For this exploit, the rough idea was to leak a heap pointer and obtain arbitrary read and execution flow control through making fake in-memory structures that SQLite depends on.

Here is a brief TLDR of the exploit:

  1. By overwriting the NULL byte of the NULL-terminated SQLite column name, we get an OOB read that helps us leak a pointer to a heap chunk
  2. We make a fake object (Expr) in the previously mentioned heap chunk and use a partial overwrite to point the default value pointer for a column to the heap chunk. As the Expr object contains a string pointer of our control, inserting rows into the column with the default value inserts the string read from our controlled string pointer, giving us an arbitrary read.
  3. We create a heap layout such that we can use the arbitrary read to leak Chrome’s base address
  4. We make a fake vtable object in the same heap chunk and use a partial overwrite to point to it. On ending the SQL transaction, SQLite will call a method in the fake vtable to do the cleanup, giving us execution flow control.

Now, let’s move on to the details.

Initial Leak

From this extremely useful guide here, we understand that table and column objects are retained in memory across SQL transactions in WebSQL. If we create a table with a single column that consists of 0xff0 characters, WebSQL will allocate a 0x1000 size chunk. This chunk will not be freed until we drop the table.

img

Figure 1. 3 columns allocated

As such, we allocate 3 chunks of size 0x1400 by creating 3 tables each with a single column of slightly below 0x1400 bytes. From calculations in TCMalloc, chunks of size 0x1400 reside in spans of size 0x4000, meaning that only 3 chunks can fit in a span. With approximately 1-in-3 odds, these 3 allocations will fall into the same span, thus making them adjacent, allowing for buffer overflow.

To carry out the actual buffer overflow, there exists one more hurdle: WebSQL and font loading in chrome are done by different threads. In fact, Chrome has a pool of renderer worker threads named CompositorTileW, and any one of them can load our font when requested. As the thread cache isolates chunks between threads, the renderer worker threads cannot use chunks freed by the WebSQL thread, and vice versa.

// thread_cache.cc
void ThreadCache::ListTooLong(FreeList* list, uint32 cl) {
  size_ += list->object_size();
  // Omitted...
  if (PREDICT_FALSE(size_ > max_size_)) {
    Scavenge();
  }
}

In order to pass a free chunk from the WebSQL thread cache to other threads, we will trigger two calls to the function ThreadCache::Scavenge.

// thread_cache.cc
void ThreadCache::Scavenge() { // shortened for easier reading
  for (int cl = 0; cl < Static::num_size_classes(); cl++) {
    FreeList* list = &list_[cl];
    const int lowmark = list->lowwatermark();
    if (lowmark > 0) { // (2)
      const int drop = (lowmark > 1) ? lowmark/2 : 1;
      ReleaseToCentralCache(list, cl, drop);
    }
    list->clear_lowwatermark(); // (1) void clear_lowwatermark() { lowater_ = length_; }
  }
  IncreaseCacheLimit();
}

In the first call to ThreadCache::Scavenge, the function sets the lowater_ member variable for each free-list in the thread cache to the number of chunks on the free-list (1). In the following call, it will release half of the lowater_ chunks from each free-list to the central free-list (2), where chunks are made available to all threads. If we have our target chunk on the thread cache for WebSQL, triggering the two calls to Scavenge will allow access to the chunk from the font renderer threads.

Another issue to take note of is that throughout this exploit, each time we want to carry out a buffer overflow, we repeat the previously mentioned steps to put a fresh chunk in the central free-list instead of using the chunk we already overflowed from. This is because when we load a font, one of the renderer threads in the thread pool would retrieve the chunk from the central free-list as a bitmap buffer, and in doing that, the chunk will now end up in that specific renderer’s thread cache, isolated from the other renderer threads. If we load another font file, we have not found a way to be certain that the same renderer thread would load the font, thus introducing necessary unpredictability to our exploit.

img

Figure 2. 1st column’s chunk released to central free-list

The easiest method to do this is to allocate a large amount of memory via column names, before dropping the tables to free them, quickly hitting the memory limit required to trigger Scavenge.

img

Figure 3. Overflow from 1st column’s chunk into column 2 and 3

With the previously described technique, we release the chunk labeled Column 1 to the central free-list and subsequently load our first crafted font. This font has an embedded PNG of dimensions 0x10007 * 0xb6, while only holding image data of dimensions 15 * 0xb6. Despite the discrepancy, libpng will still load the available amount of data into memory. The incongruent dimensions will cause the renderer to write 0x2aa8 (4 * 15 * 0xb6 = 0x2aa8) bytes into the chunk originally occupied by column 1 (7 * 0xb6 * 4 = 0x13e8), overflowing into the entirety of both columns 2 and 3. Since the overflow overwrites column 2’s null byte (column names in SQLite are null-terminated strings), when SQLite tries to read column 2, it will also read column 3’s contents until it finds the latter’s terminating null byte.

img

Figure 4. Overflow now contains two masked pointers

Now, we drop the table holding column 3 to free its chunk. In doing so, the first 16 bytes of column 3 are changed to masked front and back pointers respectively, linking it to the thread cache free-list. However, as it is currently the only chunk on the free-list, its front and back pointers are both null (but XOR’d with the mask value), hence yielding no useful information. To remediate this, we allocate and free column 4 before we free column 3, so that column 3’s forward pointer would point to a meaningful memory location.

Leaking the value of column 2 presents yet another challenge as there appeared to be no trivial way to show the column name in memory with available SQL statements. The solution turned out to be extremely simple: running 'SELECT * FROM table' will provide the full desired column name within the error message returned.

// Comment in sqlite3.c
/* For every "*" that occurs in the column list, insert the names of
  ** all columns in all tables.  And for every TABLE.* insert the names
  ** of all columns in TABLE.

The reason behind this is that SQLite expands asterisks in SQL statements to the full set of column names in the table, followed by a database lookup on the same set of columns.

// sqlite3.c
hCol = sqlite3StrIHash(zCol); // hCol is now the hash of column 2 + overflowed data read
for(j=0, pCol=pTab->aCol; j<pTab->nCol; j++, pCol++){
  if( pCol->hName==hCol && sqlite3StrICmp(pCol->zName, zCol)==0 ){
      // Column name match success
      // From here we continue with the found column

img

Figure 5. Raw error message from WebSQL (Leaked data underlined)

During the lookup, SQLite hashes the column name in the statement and compares it to the hash stored in memory. As our buffer overflow only changes the contents of column 2 (and not its hash stored elsewhere in memory) the lookup will raise an error message with the expanded SQL statement. With the contents leaked, we XOR the first two sets of 8 bytes from column 3 to obtain a pointer leak to column 4.

Arbitrary Read

With the memory location of column 4 leaked, it is time to get an arbitrary read primitive. The idea behind obtaining arbitrary read — and subsequently execution flow control — are essentially the same: since we know the memory location of column 4, we can craft a fake object in column 4, and subsequently, overwrite another pointer to point towards the fake object at column 4 to influence program behaviour.

img

Figure 6. Aligning a buffer before the list of column objects

For arbitrary read, I chose to fake an Expr object for the pDflt field of a column. The pDflt field is an Expr object that contains a pointer to a string representing the default value for a column. By making a fake Expr object with a string pointer pointing to an address of our choice, we can read the data at that location by using an INSERT … DEFAULT VALUES statement.

Overwriting the pDflt field is no easy task: column objects are fixed at a small size, meaning that they are prone to heap noise. To work around the issue, we rely on the interesting fact that the array of columns for a table are stored as an array of objects rather than pointers to objects.

img

Figure 7. Overflowing from the buffer and pointing pDflt to our faked object

By aligning our buffer right before an array of column objects, we can overflow into the first fields of the first column object and partially overwrite the lower 32 bits of the pDflt pointer to point to column 4. However, as the buffer overflow is linear, we end up completely overwriting the zName field as well. In this case, I chose to overwrite it with the fixed memory location of vsyscall due to premultiplied alpha preprocessing.

Leaking Chrome Base

With arbitrary read, what we can do is still limited, as we lack useful memory locations to read from. Leaking the memory location of column 4 only reveals the memory location of the span in which column 4 resides in, and we cannot simply read from locations adjacent to the span as the contents are not fixed due to heap randomness. We could leak more memory locations by reading the front and back pointers of column 4 and traversing the free-list, but even with this method, we can only leak the memory locations for chunks of classes of size 0x1400. To resolve this, we needed the chunks to contain pointers to chunks of other size classes. One way of achieving this is to create an array of columns with an arbitrarily sized string as a name.

img

Figure 8. We follow the chain from left to right with our arbitrary read (yes it is complicated)

By following the zName field of columns in the chunk with our arbitrary read primitive, we can now leak the location of chunks of any size class. By dereferencing the db->aVTrans array (mentioned later) on the heap, we obtain the memory location of the function table for a SQLite virtual table (fts3Module). As this function table resides in the Chrome executable memory space, we can simply get the Chrome binary’s base address by subtracting an offset off it, paving the way for the next part of our exploit.

Execution Flow Control

// sqlite3.c
VTable **aVTrans;         	/* Virtual tables with open transactions */
// Omitted...
static void callFinaliser(sqlite3 *db, int offset){
  int i;
  if( db->aVTrans ){
    VTable **aVTrans = db->aVTrans;
    db->aVTrans = 0;
    for(i=0; i<db->nVTrans; i++){
      VTable *pVTab = aVTrans[i];
      sqlite3_vtab *p = pVTab->pVtab;
      if( p ){
        int (*x)(sqlite3_vtab *);
        x = *(int (**)(sqlite3_vtab *))((char *)p->pModule + offset);
        if( x ) x(p); // <- gives both called location and 1st arg
      }
      pVTab->iSavepoint = 0;
      sqlite3VtabUnlock(pVTab);
    }
    sqlite3DbFree(db, aVTrans);
    db->nVTrans = 0;
  }
}

The approach for obtaining execution flow control was largely similar to obtaining arbitrary read, but this time, instead of overflowing into an array of column objects, we overflow into db->aVTrans, the array of pointers to virtual tables with open transactions. On ending every END TRANSACTION statement, the sqlite3 function callFinaliser will check through each virtual table in aVTrans and call a method in its function table to carry out the committing of data.

img

Figure 8. Overflow from buffer partially overwrites pointer to point to column 4

By pointing a virtual table to column 4 with yet another fake object, we can control the address of the commit method called, along with its first argument. For the final exploitation, I chose to call setcontext+offset, which sets almost all registers stored in the location pointed by the first argument. I used it to change the memory permissions of the heap to RWX with mprotect and subsequently used a jmp rsp gadget to jump to shellcode on the heap.

Challenges

In this section, we will be addressing some of the issues of the detailed exploitation method and some potential improvements that can be made.

Heap layout reliability

One of the more obvious challenges of this exploitation approach is the problem with the reliability of achieving the heap layout we want. For example, in the first part of the exploit, to obtain an initial leak, we mentioned that there is an approximately 1/3 chance that the 3 chunks will fall perfectly into one single span. In other cases, the chunks may fall into different spans, and our excessive overflow of 0x2aa8 bytes will often cause some other (random) component of Chrome to crash. This apparent randomness of the heap is not intentional by TCMalloc (unlike Windows) and is most likely a result of the non-deterministic nature of Chrome as a multi-threaded program. While we could increase the reliability through spraying the heap, we believe that a better solution would be to retry the exploit silently in the background until we succeed, partially because other reliability issues cannot be easily resolved, as we will see later.

Overflow content limitations

One of the more severe issues is the limitation imposed on the contents of the data by the premultiplied alpha preprocessing. As the data in the bitmap buffer will be converted to follow the premultiplied alpha format, this means that we cannot do partial overwrites for some pointers. In these cases, we have no choice but to abandon the exploitation attempt.

Possible solutions

To resolve the previous two issues, we believe the best method is still to follow the site isolation technique documented in the article by Ki Chan Ahn for which we are thankful. Essentially, relying on the fact that separate websites are rendered in separate Chrome processes, we could technically spawn iframes with unique websites hosting our exploit repeatedly until the exploit succeeds. Due to time limitation, the author has not tested this approach and this is left as an interesting exercise to readers.

Conclusion

A renderer code execution that pops calculator PoC can be found here. Do note that it does not contain a sandbox escape component so the sandbox has to be disabled with the command-line parameter “–no-sandbox”. Alternatively, a video of the PoC in action can be found here as well.

The PoC contains the initial static font file exploit, the Javascript exploit file for setting up the heap, as well as a python HTTP server designed to generate a custom font file for subsequent exploitation stages on-the-fly. This vulnerability is slightly tougher to exploit than usual and the exploitation steps outlined here are by no means the best way to do it. Nonetheless, it was still an interesting issue to look into.

We are thankful to Ki Chan Ahn for the inspiration to use WebSQL in this exploit as well as the wonderful reference on WebSQL internals that prevented many potential pitfalls for us.

We are also thankful to Gengming Liu for his blog post that helped us understand how to pass chunks across the TCMalloc thread caches.

Last but not least, the author would like to thank his friends and teammates for their proofreading as well as assistance provided during this period taken to write the exploit and the blog post.