Don’t Trust Your Users – Data Cleaning in Node.js

Data Cleaning in Node.js

APIs are all the rage nowadays. It’s the go-to interface that connects the backend with the frontend. When data is passed to the client, the vulnerability issues are not so bad. Why? Because you’re on the receiving end and the client only needs to process what it’s given.

 

The floodgates start to open when the backend starts receiving data back. When the backend receives data, it opens itself up to SQL injections, unrestricted uploading of dangerous file types, OS command injections, path traversals, cross-scripting and forgery.

 

While the frontend can do general checks, the sent data package can be hijacked before it reaches the server. This is where data cleaning comes in.

 

node.js is a JavaScript-based technology that can be used to create backends. It’s quick to boot up, supported everywhere, and has a large and highly engaged developer community. Being JavaScript-based, it is also easy to pick up.

 

Here are some methods to help keep your data clean and ensure that nothing nasty slips through into your database, so that malicious users can call on it and wreck havoc on your applications.

 

What JSON Can and Cannot Do

On the surface, a JSON object looks like a normal JavaScript object. However, the scope of its functionality is different. A JavaScript object allows for executable functions to be attached as properties. In contrast, JSON only allows for a basic set of primitive types and JSON objects. The primitive types allowed are string, number, array, boolean, and null.

 

Because JSON can only contain the above listed types as their values, they are the first line of defense for developers. Malicious users cannot inject executable functions directly through the data passed over the server via a JSON-based API.

 

Unfortunately, JSON injections are still possible. The vulnerability lies in eval().

 

eval() is often used to deserialize JSON into a JavaScript object, but it also opens up the possibility of turning strings into executable commands. Deserialization is a process of translating data transferred over the network into a JavaScript object or equivalent data type and structure. An innocent looking JSON object with a function inside it can open up the floodgates to unwanted executions inside your node.js application.

 

Following is an example of eval() used and executing a piece of string data.

 

 

Running eval() on a string can be dangerous. So is there a safer option? In JavaScript, there isn’t really an alternative. However, there are certain things you can do to reduce the risk of potential code injections.

The first is to not use eval() at all. The second is to apply regex.

 

Data Cleaning JavaScript Objects

Regex – or regular expression – is a way to check for various characters and decide what to do. This means either inclusion or exclusion. In our case, brackets, braces, and special symbols are often the culprit for executable commands, especially when eval() is also involved in some capacity.

The following shows how to apply regex to your string and remove any unwanted characters from it.

 

 

replace() is the JavaScript function that allows you to use regular expression to include or exclude characters you may or may not want from your string. Removing common problematic characters like {}, (), [], #, and $ are often enough to turn executable commands back into simple strings that have no impact on your application. It’s one of the easiest and fastest ways to sanitize your data before you start processing it.

 

Determining Data Validity

Data validity is determined by the constraints you put on it through checks before it gets saved into a database. A common data validation method is to set pattern checks against the value. This is because certain pieces of data follow specific formats such as email addresses and phone numbers.

 

However, some applications only do the minimal type checking, which can lead to broken data. Broken data can be a disaster further down the line if it becomes part of constructing another piece of data. Here, the domino effect is imminent.

 

So how do you enforce regular pattern checks?

 

For emails, there are two parts: the address name followed by the domain name, separated by the @ symbol. Regular pattern checks can be done through regular expression to ensure that the format is correct.

 

The address name usually contains a mix of letters (upper and lower case), digits, and characters. The @ is always present and only once, followed by a domain name that ends in an extension such as .com, .io, .org, or .net.

 

The following code shows an example of a pattern check that ensures the email is in a valid format.

 

 

This is just one example. Another common place for pattern checks is phone numbers. The general practice is to strip all the added spaces and hyphens away to leave only the numbers. Length gets checked and then broken up into prefixes and suffixes to create a fine grain view of the data, or it is left as is. The digits can be turned into a number type rather than remaining as strings (if they started as strings), leaving less room for errors.

 

Pattern checking is also a way to enforce data consistency. When values are formatted and saved the same way every time, future processing of that data becomes easier to handle. It also reduces the number of potential bugs and edge cases as all the data are the same, except for the actual value itself.

 

Final Thoughts

At the end of the day, node.js is JavaScript. Dealing with data is a good chunk of a developer’s job. Though it’s tempting to trust that the data you receive from the API or frontend is correct, you should never trust that data. There is always someone, somewhere that will figure out a way to exploit your app through the data they send you.

 

This is why data cleaning is important. It is also a process of getting rid of unnecessary characters from the data you receive and avoiding eval() as a method of deserialization. Enforcing pattern rules, regardless of where your data is coming from, also makes a good backstop against invalid or malicious data. It increases your data’s robustness and protects it from degrading into flakes over time.

 

As more data passes through the application, you need a high level of discipline in enforcing patterns and data rules in every step possible to ensure that your data is what you expect it to be. While you may know and trust the developers who work on the layers in between, the intrapersonal relationships don’t always guarantee data safety.

 

To fix this, always treat the incoming data as if it were from an unprocessed source. This means don’t trust the data, its formatting, or special characters that may find their way into your application and create blimps. It also means that you need to check that the data you’re sending out is also correct. Check the shapes of your objects and data patterns of values before you send them on their way through your connected interfaces.

Aphinya Dechalert

Aphinya Dechalert / About Author

Aphinya is a skilled technical writer with field experiences in software development, agile, and JavaScript full stack with AWS and Google cloud. She is a developer advocate and community builder, helping others navigate their journeys and careers as developers.