Educated Guesswork

Understanding Memory Management, Part 1: C

Cover image

I've been writing a lot of Rust recently, and as anyone who has learned Rust can tell you, a huge part of the process of learning Rust is learning to work within its restrictive memory model, which forbids many operations that would be perfectly legal in either a systems programming language like C/C++ or a more dynamic language like Python or JavaScript. That got me thinking about what was really happening and what invariants Rust was trying to enforce.

In this series, I'll be walking through the logic of memory management in software systems, starting with the simple memory management in C and then working up to more complicated systems. This series isn't intended to be a tutorial on how to write C, Rust, or any other language; rather the idea is to look at how things actually work under the hood at a level that we usually ignore when all we are doing is trying to write code.

How Programs Use Memory #

Consider the following program, written in Python

all_lines = []

f = open("input.txt")
for l in f:
all_lines.append(l.strip())

all_lines.sort()
for l in all_lines:
print(l)

This program does something very simple, namely, it reads input from the file one line at the time, then sorts the lines, and prints out the lines in sorted order. So if input.txt has the following contents:

jim
bob
deb
carol

The output will be:

bob
carol
deb
jim

However, there is something complicated hiding under the hood: because we don't know the sort order of the lines in advance, we have to store all of the lines we've read until we know that we've seen all of them, and only then can we write them in sorted order.[1] If we're going to store all the lines, they have to go somewhere, which is in the computer's memory.

Storing a List #

Conceptually, a computer's memory is just a giant table of values, with each value having an address. For convenience, let's assume that entries are numbered from 0 and each entry can hold a single character. Thus, if we want to store the string "computation", we end up with something like:

Memory Address Value
0 c
1 o
2 m
3 p
4 u
5 t
6 a
7 t
8 i
9 o
10 n

Or, in a more compact notation:

Starting Address
0           | c | o | m | p | u | t | a | t | i | o |
10          | n |

The way to read this is that each cell is a memory location and on the left we have the starting address for each row, so the p is at address 3 (the fourth column in the first row).

With this in mind, how do we store the data from this file into memory. The obvious way is something like this, which immediately reveals that we have a problem:

0           | j | i | m | b | o | b | d | e | b | c |
10          | a | r | o | l |

If we just concatenate the values in memory, how do we know where one line ends and the next begins? For instance, maybe the first two names are "jim" and "bob" or maybe it's one person named "jimbob", or even two people named "jimbo" and "b". Obviously, we need some way to keep track of the memory regions associated with individual values.

There are a number of alternatives here, but let's just do something obvious, which is to prefix every value with its length, like so:

0           | 3 | j | i | m | 3 | b | o | b | 3 | d |
10          | e | b | 5 | c | a | r | o | l |

If you know that an entry starts at address X, then you can print out that entry in the obvious way:

address = X
length = *X
address = address + 1
while length > 0 {
    address = address + 1
    print *address
}

For non-C programmers, the notation *X means "take the value at memory address X" (technical term: dereferencing X) so the second line is setting length to be whatever is in X and the 5th line is printing whatever is currently stored at address. So, what happens here is that we first read the length of the current line out of address, then count down one character at a time.

So far so good, but what if we want to print out the whole list? The obvious thing to do is just to repeat the process above, but now we have a new problem, which is knowing when to stop. Remember that the memory is a giant table and we're just showing the relevant portion of it. In reality, we have:

0           | 3 | j | i | m | 3 | b | o | b | 3 | d |
10          | e | b | 5 | c | a | r | o | l | O | T |
20          | H | E | R |   |   | S | T | U | F | F |
30          | . | . | . |

This is just a new version of the same problem, which is we don't want want to read off the end of the list. This requires knowing where does our list end and the other stuff in memory begins. One obvious thing to do is to prefix the list with the amount of memory that it consumes, like so:

0           | 19| 3 | j | i | m | 3 | b | o | b | 3 |
10          | d | e | b | 5 | c | a | r | o | l |

Now we can write a program to go over the whole list, like so:

total_length = *X
address = address + 1
while total_length > 0 {
length = *X
address = address + 1
total_length = total_length - (length + 1)
while length > 0 {
address = address + 1
print *address
}
}

Note that this is a self-contained object. As long as you know where it starts and what type it is (i.e., a list of strings), then you can read it out knowing just the starting address X. I.e., we can have a subroutine/function, like so:

function print_list(X) {
total_length = *X
address = address + 1
while total_length > 0 {
length = *X
address = address + 1
total_length = total_length - (length + 1)
while length > 0 {
address = address + 1
print *address
}
}
}

However, we do have to remember where the object starts, as it's not going to always start at 0 (what if we have two lists, or a list and something else?). So how do we do that?

The Stack #

Up till now I've been acting like memory is just an undifferentiated table, but the reality is much more complicated. Although from a hardware perspective the memory is largely undifferentiated there is a conventional way to lay things out, as shown in this diagram I borrowed from Geeksforgeeks:

C memory architecture

To orient yourself, address zero is at the bottom of the diagram and higher addresses are at the top. The program is actually split up into two pieces: the program itself ("the text segment") There are also two different parts of memory where the program's data is stored call the "stack" and the "heap". Very roughly speaking, they are used like this:

  • The stack is used to store fixed-size data that is part of the local context of the function.

  • The heap is used to store arbitrary-sized data or data that survives past the lifetime of a function.

For instance, in our print_list() function above, total_length, address, and length are fixed size values (effectively integers big enough to hold a memory address), so they can be allocated on the stack. By contrast, the list of strings is arbitrary sized and in fact of a size that's dependent on the file we are reading in, and so is allocated on the heap.

When we call a function in a compiled language like C (or Rust), the compiler makes sure you have enough space on the stack to store all the variables it needs and makes room for it in memory. This is called a "stack frame". So, print_list() would have a stack frame big enough to store all three of these values just laid out end to end, like so:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| total_length  |   address     |   length      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Importantly, the layout here is fixed (and dictated by the compiler) and so we don't need to have any metadata telling us how long things are; the compiler just knows.

The stack is laid out contiguously in memory, with each function call extending the stack by enough room for the stack frame associated with that function, which depends on which function it is. For technical reasons, the stack grows "downward" in memory towards smaller addresses, so the callee has a lower address than the caller. The following figure shows a simple example.

The stack for a simple function call

The stack before, during, and after simple function call

At the left of the figure, we see the situation where we are in the function f(). The stack just consists of the stack from for f(). If f() calls g() then we add a new stack frame for g() (technical term: pushing onto the stack), shown in the middle of the figure. Then when g() returns, the stack shrinks (technical term: popping the stack), leaving us back where we were before. Note that if f() called a different function h(), it would end up where the g() frame was before, but might be of different size, depending on how many local variables it had.

You should now be able to see why the stack isn't suitable for variable-sized objects: we need to allocate the stack frame when the function is called, and we can't do that if we don't know how big the variables in the stack will be. It's possible to grow stack frames but not convenient, so instead, we need to allocate them somewhere else, which is what the heap is used for.

The Heap #

Conceptually the heap is just a big pile of memory that we can allocate space out of. In many languages (e.g., Python or JavaScript) this is done automatically when you make an object, but in C, you have to do memory management by hand. This is done with the malloc() API, which is used like this:

space = malloc(100);

This works exactly like you would expect, namely reserving a block of 100 bytes on the heap that the caller can then use however they want. The return value from malloc() is the memory address of the allocated region. Internally, of course, malloc() has to do some bookkeeping to know which memory is in use and which is not. There are a large number of different data structures that can be used here, but essentially any technique will involve using some of the heap for that bookkeeping, leaving the rest available for allocation.

Memory Management in C #

With that background, let's try rewriting our program in C, where we have to do the memory management by hand. This gets a lot more complicated, so let's take it in pieces.

Read in the file. #

First, we have to read in the file.

  FILE *fp = fopen("input.txt", "r");
char line[1024];
char **lines = NULL;
size_t num_lines = 0;

if (!fp) {
abort();
}

// 1. Read in the file.
while (1) {
char *l = fgets(line, sizeof(line), fp);
if (!l) {
break; // End of file (hopefully).
}

// Make room in the list of lines.
if (!num_lines) {
lines = malloc(sizeof(char *));
} else {
lines = realloc(lines, (num_lines + 1) * sizeof(char *));
}
if (!lines) {
abort(); // We are out of memory so panic.
}

char *copy = strdup(l);
if (!copy) {
abort(); // We are out of memory so panic.
}
lines[num_lines] = copy;
num_lines++;
}

We start by opening the input file, in this case input.txt. Then, as before, we're going to iterate over the lines in the file and add them to our list of stored lines. This is accomplished by our while loop.

We can read the line in using the fgets() function, which reads a line (defined by ending in a \n newline character) out of the file fp into the buffer (memory region) associated with line.

  char line[1024];

...

// 1. Read in the file.
while (1) {
char *l = fgets(line, sizeof(line), fp);
if (!l) {
break; // End of file (hopefully).
}

What's a buffer? #

For those of you who haven't heard the term before, a buffer is just programmer jargon for some piece of storage used to hold data temporarily, as in this case case where we're reading in a line of data and then quickly doing something with it. It's also the name for this doodad which is responsible for stopping trains which don't stop on their own at the end of the track.

A buffer

From Wikipedia by User:Orderinchaos - Own work, CC BY-SA 3.0, Link

There's already something sus here, though. Did you notice the line char line[1024]? This is C notation for "make a buffer called line which is long enough to hold 1024 characters". As noted before, line has to be fixed size and 1024 is just an arbitrary number that's hopefully large enough to hold any line in the file.[2] But what if one of the lines is longer? In that case, fgets() will break up the line into two pieces, causing the program to be incorrect. The right way to do this would actually be to keep reading until we had a full line, but this would make the program even more complicated, so we'll just live with the defect, seeing as it's an example program.

fgets() returns a pointer to the input buffer if successful and a zero value (NULL) at the end of the file (or any error, actually), so when we test for l, we are actually testing for the end of the file, at which point the loop exits.

At this point, we have the next line of the file in line, but we overwrite that buffer every time we read a new line from the file, so we need to store it somewhere. We use the lines variable for this, which is defined as:

  char **lines = NULL;
size_t num_lines = 0;

This notation can be a bit hard to read for non C programmers, but briefly * means that something is a pointer, which is to say that it's something that holds a memory address. ** means that it's a pointer to a pointer, which is to say that lines holds the address of a block of memory that is itself full of values that themselves are memory addresses, in this case the individual stored lines. num_lines stores the number of lines that we have in memory.

We can see this in action by looking at the next block of code:

    // Make room in the list of lines.
if (!num_lines) {
lines = malloc(sizeof(char *));
} else {
lines = realloc(lines, (num_lines + 1) * sizeof(char *));
}
if (!lines) {
abort(); // We are out of memory so panic.
}

char *copy = strdup(l);
if (!copy) {
abort(); // We are out of memory so panic.
}
lines[num_lines] = copy;
num_lines++;

Storing a copy of line is a two-part process:

  1. Make a copy of the line itself.
  2. Store the pointer to that line in lines.

However, in order to store that pointer, we first need to make room in lines, which means allocating some memory. This happens on the two lines at the start of this snippet. There are actually two cases here:

  1. Lines is empty (nothing is stored), which happens at the start.
  2. Lines is non-empty but doesn't have enough room.

We distinguish these by looking at num_lines which starts at 0. In the former case, we allocate enough memory for a single line, like so:

      lines = malloc(sizeof(char *));

This says "make enough room to hold the address of a single string", and is nothing we haven't seen before.

The latter case is more complicated, however, because we already have something in lines, it's just that there's not (necessarily) enough room in memory to add another value. This means we (may) need to

  1. Allocate enough memory to hold the new number of values.
  2. Copy over the current contents of lines into the new memory region.
  3. De-allocate the original memory.

What are all the parentheticals doing here? The answer is that the block of memory pointed to by lines may already be big enough. When you call malloc(size) the system guarantees that the returned pointer is at least big enough to hold an object of size size—assuming that the allocation succeeds—but it's allowed to be larger. This could happen for a number of reasons (see How Malloc Works below for some more background), but one of which is to facilitate exactly this case: if people want to resize an object frequently, as we are doing here, then it's not efficient to have to copy the contents of the object over and over again. Instead, you can allocate more space than the programmer asked for and then when they ask for more, just say "ok" without taking any other action.[3] All of this is handled automatically by the realloc() function call.[4] At the end of this process lines may or may not have the same value from what you passed into realloc. What matters, though, is that the memory that lines points to has the same contents as before, and that's what realloc() guarantees.

It's also possible that the memory allocation will fail, for instance if the computer is out of memory. In that case, lines will be set to NULL and we need to abort:

    if (!lines) {
      abort(); // We are out of memory so panic.
    }

Now that we have room in lines we can store a copy of the actual line we've read in, but first we have to make a new buffer to store it in (remember that line will be overwritten). We can do that with the strdup() function call, which makes a copy of a string, allocating new memory as needed:

    char *copy = strdup(l);
if (!copy) {
abort(); // We are out of memory so panic.
}

Here too, we can run out of memory, so we need to check for copy being NULL.

Finally, we can append the copied line to the end of lines and increment the number of stored lines:

    char *copy = strdup(l);
if (!copy) {
abort(); // We are out of memory so panic.
}

The diagram below should help provide an understanding of the data structures here:

Stored lines in C

Data structure for stored lines.

On the left we have the lines variable itself, which is stored somewhere on the stack. It contains the address of the memory we have allocated to store the list of lines, namely address 1024. That memory region is shown in the middle of the diagram. Finally, on the right we have the individual regions for each stored line. The memory region for lines stores their addresses, each laid out one after the other. Note that there's no variable on the stack which points to these regions, they're just pointed to by the addresses stored in the region pointed to by lines.

Sorting the Lines #

The next thing we do is to sort the lines. This is done by the qsort() library function.

  // 2. Sort the lines.
qsort(lines, num_lines, sizeof(char *), compare_string);

qsort() is kind of hard to use because it's designed to sort a list of any kind of object of any size. This means that you have to pass:

  • The pointer (address) to the first object in the list (in this case lines)
  • The number of objects in the list (num_lines).
  • The size of the objects sizeof(char *). In this case, that's the size of a pointer to a string.[5]
  • A comparison function which tells qsort() the sort order for two objects. I'm going to skip over the details here.

qsort() sorts its arguments in place, so this means that after it's done, lines is sorted, like so:

Stored lines in C (sorted)

Data structure for sorted stored lines.

Importantly, the only thing that's changed here is the values in the middle memory region, referenced by lines. The actual strings are unchanged and are in the same memory regions; we've just rearranged the pointers in lines to point to the strings in the right order. Note that that order is not given by the numeric order but rather by the lexical order of the strings. That's convenient, but actually required, because qsort() doesn't know anything about the objects it's sorting; it just knows how to pass them to the comparison function, so all it can do is manipulate its own data as if the objects were numbers, whatever their actual semantics.

Printing the Results #

After all this, we're ready to print the results. This is comparatively simple, just iterating over the entries in lines and printing the corresponding strings:

  // 3. Print the lines.
for (size_t i=0; i<num_lines; i++) {
fputs(lines[i], stdout);
}

One thing you may be wondering about here is how we know how long each line is, as we haven't stored any length information. The convention in C is actually to have each string end with a byte with the value of 0, usually written as \0. This is pretty universally agreed to have been a bad idea, but we're taking advantage of it here because it means that the strings are self-contained.

Cleaning Up #

Finally, we want to clean up. In this simple program, that's not really required because when a program terminates the operating system automatically reclaims its resources, including memory, but this function might be used by some bigger program, in which case we'd want to reclaim the memory used by the list of strings, as well as close the open file (remember input.txt?).

If this function were to return without cleaning up, it would create what's called a "memory leak". Remember that the only variable in our program that knows about any of this memory is lines, which points to the list of pointers for the individual stored lines. lines is on the stack and will be lost when the function returns, so if the function returns without cleaning up, then there is no program variable pointing to any of this memory and it's just lost.[6] The result of a memory leak is that the leaked memory isn't available for new allocations but also can't be used because there's nothing pointing to it. If the program runs long enough and has a big enough leak, you can eventually accumulate enough leaked memory to affect the program function or even cause it to run out of memory, so you want to clean up. This is one reason why it often works to restart a program that seems stalled.

In C, memory is freed using the free() function, which takes the pointer to be freed. Here's what the cleanup looks like:

  // Clean up.
fclose(fp);
for (size_t i=0; i<num_lines; i++) {
free(lines[i]);
}
free(lines);

Note that we need to free the stored lines before we free lines because once we've freed lines nothing points to the stored lines and so there's no way to free them (remember, we need their addresses). This means we need to iterate through lines freeing each individual allocation and then only when we're done freeing lines itself. Recall that all the local variables will just be deallocated when the function returns. However, this doesn't mean that the things they point to are deallocated, just that the storage used by the variable itself is reclaimed.

Error Handling #

Because this is demonstration code, I've chosen to ignore the case where a line is longer than 1024 characters, but what if we wanted to handle that instead? You can detect this case with fgets() by checking to see if you have a newline at the end of the buffer[7]

    char *l = fgets(line, sizeof(line), fp);

if (!l) {
break; // End of file (hopefully).
}
if (l[strlen(l)-1] != '\n') {
return BAD_LINE_ERROR;
}

The problem with this code is that it has a memory leak: if we have already read in some of the lines, then we'll leak lines and whatever lines we read in. In order to avoid the leak, we need to run our cleanup routines. Once common way to handle this is to have an error block that we execute. For instance:

  int status = OK;

...

char *l = fgets(line, sizeof(line), fp);

if (!l) {
break; // End of file (hopefully).
}
if (l[strlen(l)-1] != '\n') {
status = BAD_LINE_ERROR;
goto error;
}

...

error:
// Clean up.
fclose(fp);
for (size_t i=0; i<num_lines; i++) {
free(lines[i]);
}
free(lines);
return status;

The goto instruction here just says to go to the line labelled error.

This works but is error prone: you need to note every case where the function might return and jump to the right error block. Moreover, the error block needs to be able to clean up after any kind of error, so, for instance, it needs to be able to handle when lines = NULL (fortunately, free() handles this case automatically). Finally, if you forget to set the status value, then you are incorrectly returning an OK status even if there was an error.

Locally Scoped Allocations #

You might ask why you can't just free all the memory that was allocated by a function when the function returns rather than just the memory on the stack? Then we wouldn't have to do all of this stuff where we explicitly free everything.

There's an obvious answer to this question: some functions intentionally allocate memory and don't clean it up. An obvious example here is the strdup() function we used above. Internally, strdup does something like this:

char *strdup(const char *str) { 
size_t len = strlen(str);
char *retval = malloc(len);
if (!retval) {
return NULL;
}
strcpy(retval, str);
return retval;
}

If we automatically freed all memory that was allocated by the function, then we would free retval before returning, at which point the caller would be left with a pointer to memory that has been freed, which is clearly a problem. In fact, it's the source of a common security vulnerability called use after free. Clearly we would need something more sophisticated than just freeing everything that was allocated in the function when it exits.

What you actually want is the compiler to know when objects are intended to outlive the function and when they are not, but actually distinguishing these cases is very difficult in C, at least without help from the programmer, as we will see in the rest of the series. In fact, making this kind of analysis possible is one of the the main motivating design choices for much of the Rust memory model.

How malloc() works #

So far we've just been treating malloc() as a kind of black box, and that's generally fine for most programming tasks, but it's helpful to have some sense of what's going on internally. The first thing to realize is that malloc() isn't magic. In fact, you can write your own memory allocator in C (Firefox, for instance, uses a custom allocator).

At a very high level, you should think of malloc() as having access to one or more large contiguous blocks of memory, which it then dispenses on demand. On a very simple computer, malloc() would just have access to the entire memory of the machine, but on a modern multiprocess operating system, it gets chunks of memory from the operating system. For our purposes, let's easiest to think of it as having a big contiguous chunk of memory to work with. As I said, we usually wouldn't start at memory location 0, so we'll just assume the block starts at 1000.

The figure below shows the situation after a single allocation of size 200, with the allocation being red and the unallocated space being blue. What's happened here is just that malloc(200) just picked the first available memory region, which is at the start of the block because no memory has been allocated.

1: 1000-1199
Unallocated
Unallocated

The allocation starts at address 1000 and goes to address 1199, so malloc() just returns the address 1000, which points to the start of the allocated region.

The next figure shows the situation with two more allocations, one of size 400 and one of size 200. Again, this is what you'd expect: the allocator just picks the lowest available region. As noted above, a real allocator would probably leave some extra space to facilitate growing the allocation but we're trying to keep things simple for the purpose of examples. Designing fast memory allocators is a whole (complicated) topic all on its own.

1: 1000-1199
2: 1200-1599
3: 1600-1799
Unallocated
Unallocated

So far so good, but now what happens when we free allocation #2? The result is shown below.

1: 1000-1199
Unallocated
3: 1600-1799
Unallocated
Unallocated

We have a 400 byte sized hole of free memory. If we try to do another 200 byte allocation, it will work fine, like so:

1: 1000-1199
4: 1200-1399
Unallocated
3: 1600-1799
Unallocated
Unallocated

But if we now try to allocate another 400 bytes, it obviously won't fit, so we need to go into higher memory.

1: 1000-1199
4: 1200-1399
Unallocated
3: 1600-1799
Unallocated
4: 2000-2399
Unallocated

As the program runs longer and memory is allocated and freed you tend get lots of small holes that can't be filled with big allocations, and so you have to allocate higher and higher memory regions. This is called fragmentation. In the extreme, you can get to the point where you can't allocate new memory even though there's actually plenty of free space; it's just not in a convenient form. There are techniques for avoiding this kind of fragmentation as well as for allocating memory more efficiently, but they're too advanced to cover here.

How does malloc() get its memory? #

As I said above, malloc() gets chunks of memory from the operating system. Remember that your program has to share the computer, including its memory, with other programs and the operating system is responsible for arbitrating which program has which chunk of memory. Conceptually this is actually somewhat like malloc() except that malloc() calls some system API (for instance, mmap() to request memory from the operating system.

Note that modern systems all have virtual memory, systems in which the system automatically moves ("swaps") data in and out of the physical memory and onto disk so that programs can allocate more space than is in the hardware of the system. In order to this, the operating system may have to move stuff around in physical memory, so it maintains a mapping from the address the programs use to the actual physical location in memory. In a future post, I may cover virtual memory in more detail, but no promises.

A natural question to ask is why we can't just move things around to accommodate the holes? The reason is that the pointers stored in the program literally just point to the memory addresses where the allocations are stored. So if (for instance) we were to slide allocation #3 over to the right to make room for allocation #4, then whatever pointer was returned from the initial malloc() for #3 would now point somewhere in the middle of allocation #4, which is obviously a problem. The compiler doesn't keep track of the variables holding the pointers, so it has no way to go back and readjust them. The key thing to realize here is that all that malloc() and free() are doing is bookkeeping: the allocator remembers which memory is currently in use and then hands out pointers to regions that aren't currently in use as needed. This naturally raises the question of how the allocator does the bookkeeping. How does it remember which regions are in use and which aren't? The obvious answer is the right one: the allocator reserves some of the memory it has to work with for this kind of bookkeeping metadata, at minimum:

  • The size of each allocation (so it can be freed)
  • The regions that are currently free, for instance the top of the highest allocation and the addresses of the the freed holes.

When you call malloc() the allocator finds a suitable region and allocates it. When you call free() it adds it to the list of holes (or adjusts the highest allocation value if it's the highest allocation).

Interestingly, it's not always necessary to store a list of every chunk of allocation memory. You can do this, but that means you need some data structure that lets you look up the allocations from their addresses. A common thing people do instead is to store the per-allocation metadata as a header right before the allocated region. The header contains the size of the allocation and maybe some other stuff. For instance, the first allocation above might look like this:

Header
1: 1100-1299
Unallocated
Unallocated

Instead of returning 1000, in this case malloc() would return 1100. (I've drawn it as 100 to make the figure more readable, but hopefully you're not wasting 100 bytes of overhead on every allocation.) Then when you call free(1100) the allocator would subtract the size of the header and deallocate the whole region from 1000-1299. The reason this works is that free() requires knowing the memory address anyway, so there's no need to store it. If you call free() on some region of memory that wasn't returned from malloc() the results are likely to be disastrous, because free() has no way of knowing that this is a mistake and will just treat whatever is right before the pointer you passed in as the header. If that data is attacker controlled, it can easily lead to a vulnerability.

Multiple References and UAF #

Let's consider a slight modification of the function we've been looking at, in which along with printing out all the lines, we instead return the last line in sort order.[8] With a lot of trimming, the function might look like this:

char *find_smallest(const char *filename)
{
...

// 2. Sort the lines.
qsort(lines, num_lines, sizeof(char *), compare_string);
largest = lines[num_lines - 1];

...

// Clean up.
fclose(fp);
for (size_t i=0; i<num_lines; i++) {
free(lines[i]);
}
free(lines);

return return_value;
}

This function gets called like this:

char *largest = find_largest("input.txt");
printf("%s\n", largest);

The experienced C programmer will immediately note that this code has a serious bug, because we are trying to use the memory pointed to largest after we have free()d it. When the calling function tries to use largest, there are no guarantees at all about what will happen. This is called a use after free (UAF) bug. For example, the allocator might have reallocated the memory in response to some other call to malloc(), in which case it is now full of some other data. Of course, it's also quite likely that the region is still unused and has the same contents as before; it's just that the allocator added it to the list of holes. In this case, the program may work fine under test but then fail unpredictably later when some change to your code causes allocations to happen differently and suddenly largest points to some memory reason being used for something else.

The reason this is all possible is that in C pointers are just values that hold the memory address; effectively they're just numbers and they behave like numbers. So if you assign a pointer value to another variable, now you have two variables that point to the same thing (i.e., they have the same value). When we call free on the first copy of the variable, that doesn't have any effect at all on the other copy (or on the first one, for that matter). It just changes the state of the memory region addressed by the variable. Once you've called free(x) you're still left with whatever is in x, and nothing in C stops you from using it; it's just illegal to do so, and it's your job not to, or else.

Next up: C++ #

As you have probably gathered by now, managing memory yourself is a huge amount of work, which is one reason why C programs have so many memory issues. In the next post in this series, we'll be taking a look at C++, which has some features that make things a bit better, at least some of the time.


  1. This code actually stores the lines in unsorted order, then sorts them, and then finally writes the output, but you could also store them in a sorted data structure. Either way, you need to store all of the lines. ↩︎

  2. C99 supports a feature called "variable length arrays", but they don't automatically grow the way a Python array does, so that doesn't help us much. There seems to be a lot of sentiment that they are a misfeature. ↩︎

  3. If you know you're going to be doing a lot of reallocation like this, many people will themselves overallocate, for instance by doubling the size of the buffer every time they are asked for more space than is available, thus reducing the number of times they need to actually reallocate. I've avoided this kind of trickery to keep this example simple. ↩︎

  4. In reality, you can pass a NULL to realloc() for the existing memory and it will just allocate new memory, but I'm handling the cases separately for pedagogical reasons. ↩︎

  5. In C, pointers to different kinds of objects can be different sizes. ↩︎

  6. This isn't technically true because the memory allocator has kept track of what memory has been allocated, but the allocator doesn't know that we should have cleaned up (for instance, we might have stored the value in lines somewhere) and so it can't clean up for us. ↩︎

  7. For the nitpickers out there, this will also catch the case where the last line in the file doesn't end in a newline. ↩︎

  8. Obviously you don't need to sort the lines in order to do the largest one, but this is necessarily an artificial example. ↩︎