Several vulnerabilities, such as buffer overflows, have always attracted and inspired security researchers due to its significant impact. In 2000, most FTP servers used WU-FTPd programs made by Washington University. So, when a relatively new technique called “format string exploit” was used to attack those servers, it shook the internet.
Most people confuse this enigmatic exploit technique with buffer overflow. At first glance, both approaches look similar as both of them involve attackers overwriting return addresses to make the program execute malicious payloads. However, they are, in fact, fundamentally different.
In buffer overflow, the programmer fails to keep the user input between bounds, and attackers exploit that to overflow their input to write to adjacent memory locations. But in format string exploits, user-supplied input is included in the format string argument. Attackers use this vulnerability and control the location where they perform arbitrary writes.
While buffer overflow attacks exist due to failure to perform stable bounds checks, format string attacks exist when a developer fails to perform reliable input validation checks.
What are Format Strings?
Format strings are one of the many things that make the C programming language feature-rich. They are used to integrate a specific format to the output displayed to the user. Format specifiers are used with various I/O operations of the program, informing the compiler on the nature of the processed data.
The following is the most straightforward C program which makes use of format strings in both input and output:
After importing the library in the main function, we define a buffer name with 99 characters. Then we use printf to tell the user to enter their name. To get the input from the user and then store it in a variable, we use scanf.
Here we encounter one format string specifier, %s. This specifier informs the compiler to expect data of the string format and then store it in the variable name’s address. Later, we use the same %s format specifier in the printf to ask the compiler to print string stored in the variable name.
This was the most basic format string you will encounter. Now let’s delve deeper into the format strings.
There are also various specifiers for various data. Here are a few:
Let us look into another program that can illustrate the use of format strings a little better.
Here, the first printf statement prints values of integers a and b in decimal with %d and prints address of b in hex with %x. We also encounter a new format string specifier, %n. %n writes to the address of the variable specified without printing anything.
The compiler writes the number of bytes written so far when %n is encountered to the corresponding variable. Attackers make use of this functionality to write to the restricted addresses and exploit programs.
To understand how those exploits work, we first need to understand the format string’s inner workings.
Consider the following diagram.
|Address of Format String
|Value of A
|Value of B
|Value of C
|Value of D
|Bottom of the stack
Whenever a function is called, a stack frame is created, and its arguments are pushed onto the stack—the same for printf. Stacking takes place from right to left.
After creating the stack frame, d is pushed onto the stack first, followed by c, b, a, and the format string’s address.
When a compiler encounters a format string specifier, it looks on to the stack to find variables.
The function iterates through the format string one character at a time.
If that character is not the beginning of a format specifier, such as %, that character is printed to the output. But if a format parameter is encountered, the action corresponding to the specifier is taken.
But what if we provide one less variable in the argument with the same number of format string specifiers?
Not finding any values to print, the compiler pops off whatever is on the top of the stack and prints it.
This results in leaking sensitive information from the application to the attacker, resulting in a considerable impact.
Now to see how we can exploit this to read sensitive variables, we use the following vulnerable program:
Compiling the program with,
gcc fmt.c -o fmt -m32 and executing it with argument AAAA
Now passing format specifiers,
When using the %x specifier, what is printed is the hex representation of a four-byte word from the stack. We can repeat the process to examine the stack.
Now we try to take this to the next level by reading from the environment variables. For that, we first export a variable PATH=asd:
Now we use a C program getenvaddr.c which can give us the address of environment variables:
Now, after compiling, we run it and provide PATH, the variable whose address we need, and ./a.out, the program which we are going to exploit.
Now first, we convert the address given by the program in the little endian address format.
So, it will be “/xc0/xdf/xff/xff”.
Then, we supply the address via printf to preserve the address format and a bunch of %x to empty the stack so that esp can point to the address we provided in the format string, i.e., the variable’s address is later printed with %s.
So this way, we can exploit format strings to escape the bounds of the application and perform arbitrary reads. Similar to what we did with arbitrary reads, we can use %n format strings to arbitrary write to variables.
This could lead to overwriting sensitive variables such as passwords and username of the program and lead to a complete compromise.
Attackers can also make use of this vulnerability to perform arbitrary writes in the dtors table of the program and make the program execute malicious code.
Preventing Format String Exploits
According to OWASP, format string exploits are possible mainly due to programmers not sanitizing user input directly and adequately performing various operations on it. Developers should also follow a predefined secure coding standard.
If a developer performs input validation and puts in place various constraints (length of the user input, special characters, etc.), then they can protect against these attacks to some extent.
Format strings should be a part of the program and not taken from user input. Identifying format specifiers in functions like printf and not using the variables to display them can solve most vulnerabilities.
Solutions like format guard can prevent format string attacks. Format guard is a patch to the standard C library, Glibc. They protect your code against format string vulnerabilities.
It disables %n by default as it can write to any location without any feedback. %n write-anything-anywhere nature is hazardous, generally resulting in remote command execution, i.e., complete compromise of the system running the program. It’s better to disable it to prevent a massive vulnerability.
Additionally, format guard permits only static format strings, so attackers can not exploit by modifying the format strings.
There is also one binary rewriting solution called Kimchi, which protects against format string attacks on runtime. It is implemented to monitor calls to machine code of printf and replaces it with the comparatively safer version of the printf called safe_printf. It also replaces printf(buffer) for safer implementation dynamically, i.e., printf(“%s”, buffer).
Address Space Layout Randomization (ASLR) can make dynamic addresses for your functions, variables, libraries, etc. so that attackers can not manipulate them.
One can enable ASLR on Linux systems with the following command:
As there are various techniques available to bypass ASLR, one should not entirely rely on ASLR.
Though Format String vulnerabilities are hard to exploit, when exploited, they can land a significant impact and result in the complete compromise of both the program and the system.
It is easier to prevent these attacks than to launch these attacks, as long as you maintain high coding standards and use programs that can secure your own.