APIs are all the rage nowadays. It’s the go-to interface that connects the backend with the frontend. When data is passed to the client, the vulnerability issues are not so bad. Why? Because you’re on the receiving end and the client only needs to process what it’s given.
The floodgates start to open when the backend starts receiving data back. When the backend receives data, it opens itself up to SQL injections, unrestricted uploading of dangerous file types, OS command injections, path traversals, cross-scripting and forgery.
While the frontend can do general checks, the sent data package can be hijacked before it reaches the server. This is where data cleaning comes in.
Here are some methods to help keep your data clean and ensure that nothing nasty slips through into your database, so that malicious users can call on it and wreck havoc on your applications.
What JSON Can and Cannot Do
Because JSON can only contain the above listed types as their values, they are the first line of defense for developers. Malicious users cannot inject executable functions directly through the data passed over the server via a JSON-based API.
Unfortunately, JSON injections are still possible. The vulnerability lies in eval().
Following is an example of eval() used and executing a piece of string data.
The first is to not use eval() at all. The second is to apply regex.
Regex – or regular expression – is a way to check for various characters and decide what to do. This means either inclusion or exclusion. In our case, brackets, braces, and special symbols are often the culprit for executable commands, especially when eval() is also involved in some capacity.
The following shows how to apply regex to your string and remove any unwanted characters from it.
Determining Data Validity
Data validity is determined by the constraints you put on it through checks before it gets saved into a database. A common data validation method is to set pattern checks against the value. This is because certain pieces of data follow specific formats such as email addresses and phone numbers.
However, some applications only do the minimal type checking, which can lead to broken data. Broken data can be a disaster further down the line if it becomes part of constructing another piece of data. Here, the domino effect is imminent.
So how do you enforce regular pattern checks?
For emails, there are two parts: the address name followed by the domain name, separated by the @ symbol. Regular pattern checks can be done through regular expression to ensure that the format is correct.
The address name usually contains a mix of letters (upper and lower case), digits, and characters. The @ is always present and only once, followed by a domain name that ends in an extension such as .com, .io, .org, or .net.
The following code shows an example of a pattern check that ensures the email is in a valid format.
This is just one example. Another common place for pattern checks is phone numbers. The general practice is to strip all the added spaces and hyphens away to leave only the numbers. Length gets checked and then broken up into prefixes and suffixes to create a fine grain view of the data, or it is left as is. The digits can be turned into a number type rather than remaining as strings (if they started as strings), leaving less room for errors.
Pattern checking is also a way to enforce data consistency. When values are formatted and saved the same way every time, future processing of that data becomes easier to handle. It also reduces the number of potential bugs and edge cases as all the data are the same, except for the actual value itself.
This is why data cleaning is important. It is also a process of getting rid of unnecessary characters from the data you receive and avoiding eval() as a method of deserialization. Enforcing pattern rules, regardless of where your data is coming from, also makes a good backstop against invalid or malicious data. It increases your data’s robustness and protects it from degrading into flakes over time.
As more data passes through the application, you need a high level of discipline in enforcing patterns and data rules in every step possible to ensure that your data is what you expect it to be. While you may know and trust the developers who work on the layers in between, the intrapersonal relationships don’t always guarantee data safety.
To fix this, always treat the incoming data as if it were from an unprocessed source. This means don’t trust the data, its formatting, or special characters that may find their way into your application and create blimps. It also means that you need to check that the data you’re sending out is also correct. Check the shapes of your objects and data patterns of values before you send them on their way through your connected interfaces.