Sanitize Database Inputs

1) Function for stripping out malicious bits

<?php
function cleanInput($input) {
 
  $search = array(
    '@<script[^>]*?>.*?</script>@si',   // Strip out javascript
    '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
    '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
    '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments
  );
 
    $output = preg_replace($search, '', $input);
    return $output;
  }
?>

2) Sanitization function

Uses the function above, as well as adds slashes as to not screw up database functions.

<?php
function sanitize($input) {
    if (is_array($input)) {
        foreach($input as $var=>$val) {
            $output[$var] = sanitize($val);
        }
    }
    else {
        if (get_magic_quotes_gpc()) {
            $input = stripslashes($input);
        }
        $input  = cleanInput($input);
        $output = mysql_real_escape_string($input);
    }
    return $output;
}
?>

Usage

<?php
  $bad_string = "Hi! <script src='http://www.evilsite.com/bad_script.js'></script> It's a good day!";
  $good_string = sanitize($bad_string);
  // $good_string returns "Hi! It\'s a good day!"

  // Also use for getting POST/GET variables
  $_POST = sanitize($_POST);
  $_GET  = sanitize($_GET);
?>

Comments

iMaxEst

# March 3, 2010

I use my own functions as follow:

function text_global($poster) {
  $poster = stripslashes($poster);
  $poster = str_replace(Array("\n", "'", "‘", "’", "′", "“", "”", "„", "″", '"'), Array("", "’", "’", "’", "’", """, """, """, """, """), $poster);
    return $poster;
}

while (list($Key, $Val) = each($_POST)) { 
 if (substr($Key, 0, 4) != "fsk_") {
  if (is_array($Val) === true) {
   while (list($sKey, $sVal) = each($Val)) {
    $Val[$sKey] = text_global($sVal);
   }
   $_POST[$Key] = $Val; 
  } else {
   $_POST[$Key] = text_global($Val); 
  }
 }
}

Where “fsk_” prefix is used for WYSIWYG editor variables. Works perfectly.

OldGuy

Permalink to comment# March 3, 2010

@iMaxEst: I think you may have missed the point here. Preparing data is just a side issue. Sanitizing data prevents code injection attacks.

stripslashes() != sanitize()

Loading...

Laura

# March 4, 2010

Really nice functions Chris! Neat way using regular Expressions these snippets will definetly find there way to my library script.

Thanks a bunch!

Henk

# March 5, 2010

Why to clean the input from html/script tags?
You only have to worry about XSS when you prepare the output!

Protect your database through prepared statements and htmlspecialchars() will care about the output.

Ivan

Permalink to comment# November 1, 2012

The answer is: Performance. Cleanning the input and not the output inplies that you’ll run a potentially resouce consumming rutine once and most likely in a segregated area of you application (i.e. maybe a members are where traffic is much less than a public area). Cleanning the output implies running the same potentially resouce consumming rutine, many, many times. As the site grows and traffic increases, CPU, Disk Space and Memory become precious comodities that one might end up wasting by cleanning the output over and over again, instead of simply cleanning the root of the problem: the input. Also, as more Memory, Disk Space and CPU are needed your cost will increase, as more more machines will be needed and perhaps even more powerful machines and disks will be needed. Food for though, in my case I rather run a scalable, sustainable and clean web application.

Loading...

jeff

# March 8, 2010

It seems like a good idea to clean input. Why do I want to store potentially malignant code in my database?

Phil

# March 15, 2010

What about ASP ? anyone..

Daniel

Permalink to comment# November 20, 2010

Phil, this can be used for ASP.NET:

AntiXSS protects against Cross Site Scripting and SQL Injection
http://wpl.codeplex.com/

Loading...

Brian Lang

# March 19, 2010

These code snippets don’t come through very nicely via RSS. All the line breaks seem to disappear.

Atspulgs

Permalink to comment# July 16, 2014

You can modify the sensitization process as you see it fit. The more you want to allow people the less secure your site will be and vice versa. Find a balance that fits you. Low traffic page wont have to worry much about security whereas high traffic page will have harder time with security.

I personally would look at things I want to allow people to do and only allow those. Then you forbid everything else. My regular expressions generally are made in a away that lists things that are allowed to use rather than trying to hunt the things I don’t want people to store in my database.

I do have to say that this example is a nice one and I think on experimenting with it a bit.

Loading...

amm257

# March 31, 2010

These are nigh useless and overly complicated; e.g. the html one simply matches anything with “<", so why not make that explicit? Currently that's all that expression does, all this extra stuff merely serves to obfuscate the issue. E.g. the javascript one doesn't work, all I have to do is add a space: "scripthere”, the browser will figure out what I meant, and the script will execute.

amm257

Permalink to comment# March 31, 2010

I apologize, whoever wrote this filter did it both right and wrong (wrong because they simply remove it, instead of escaping it, right because it catches it), I’ve cleaned it up with characters escaped by hand, this should work:

These are nigh useless and overly complicated; e.g. the html one simply matches anything with “<“, so why not make that explicit? Currently that’s all that expression does, all this extra stuff merely serves to obfuscate the issue. E.g. the javascript one doesn’t work, all I have to do is add a space: “< script>scripthere</script>”, the browser will figure out what I meant, and the script will execute.

Loading...

gibigbig

# June 13, 2010

Mines is pretty small and handy for getting rid of nasty hacking injections

function clean($text)
{
	$text = strip_tags($text);
	$text = htmlspecialchars($text, ENT_QUOTES);
	
    return ($text); //output clean text
}

webass

Permalink to comment# November 24, 2010

<SCRIPT SRC=http://hackers.com/xss.js></SCRIPT>

Loading...
joel

Permalink to comment# November 29, 2010

this is just fantastic! truly smashing. Thank you.

Loading...

Dyllon

# October 12, 2010

mine simply strips out the brackets.

function clean($code)
{
    $strip = array(
        '<' => '&lt;',
        '>' => '&gt;'
    );

    return strtr($code, $strip);
}

Mark

# November 19, 2010

Hi,

We are looking for a consultant who could evaluate our website and check how vulnerable we are to these kind of malicious scripts.

Any recommendation?

Thanks,

Mark

Bob

Permalink to comment# January 14, 2011

Don’t know a consultant but I am reading “pro PHP security – from application security principles to the implementation of XSS defenses” which explains this stuff quite well

Loading...
Michael Foss

Permalink to comment# April 13, 2011

Mark, dunno if you’re still interested, but I’ve had several years of this line of work. There are several other areas of attacks that I can investigate for you as well. Simply contact me at http://www.matatechconsulting.com/contact/ for more details.

Loading...

# November 19, 2010

Chris, why use your regexes instead of PHP’s strip_tags(), as suggested by gibigbig? I don’t understand what functionality is added by going that route.

Skye

# November 19, 2010

Glad I found. I was just thinking about this yesterday and needed a better fn.

Is the filter_var() fn any good? ie. filter_var($value, FILTER_SANITIZE_STRING)

Bob

# January 14, 2011

Thanks for the tutorial, this is very useful

fred

# August 1, 2011

The safest way is to parameterise inputs by using classes such as PDO since PHP is a loosely typed language.

Or simply cast the inputs into the type that you would expect, e.g. Expect an integer? Just put (int) before the input. Type casting is the fastest operation to sanitize numbers.

Andriah

# August 16, 2011

Hi! It’s a good day!

I’m just testing how this works?

NeoArc

# November 10, 2011

I think the satinize function needs a few lines of code, since these two values wont get filtered:

Thanks for listening

NeoArc

Permalink to comment# November 10, 2011

I mean, the cleanInput function. Sorry

Loading...
aparna

Permalink to comment# February 4, 2013

hi, can u please post me the sample output of sanitized data !

Loading...

Michael Calkins

# December 23, 2011

Lol “evilsite.com”

I really do like the efficiency of this function though.

Conrad

# December 26, 2011

Nice functions Chris!
But when I tried them I got this error.
Warning: mysql_real_escape_string() [function.mysql-real-escape-string]: Access denied for user ''@'localhost' (using password: NO)

Warning: mysql_real_escape_string() [function.mysql-real-escape-string]: A link to the server could not be established in
and this is the line of code that is causing the error
$output = mysql_real_escape_string($input);

from this Function for stripping out malicious bits

Matt Auckland

Permalink to comment# February 9, 2012

You need to form a connection to the database first, before using mysql_real_escape_string() otherwise it will error out.

I use this function for escaping date going into the database:

function escape($string = null)
{
if (empty($string))
{
return FALSE;
}

if (function_exists(‘mysql_real_escape_string’))
{
return mysql_real_escape_string($string);
}
else
{
return str_replace(“‘”, “\'”, $string);
}
}

As for tags and XSS, I use something a little more hardcore….when I say hardcore, it basicly strips out everything I don’t want, only allowing what is listed in an array I define in the script:-

public function input($string = null)
{
if (empty($string))
{
return ”;
}

// Strip out all bbcode
$string = preg_replace(‘/\[(.*?)\](.*?)\[\/?(.*?)\]/iu’, ‘\\2’, $string);

// Convert the ok markup to bbcode
$string = preg_replace(array_keys($this->markup), $this->markup, $string);

// Strip out all html tags
$string = preg_replace(‘/\(.*?)\/iu’, ‘\\2’, $string);

// Run strip tags to make sure we got everything
$string = strip_tags($string);

// Replace double quotes with single quotes
$string = preg_replace(‘/(“)+/u’, “‘”, $string);

// Matche one or more spaces and replaces it with a single space
$string = preg_replace(‘/( )+/u’, ‘ ‘, trim($string));

return trim($string);
}

Hope these two functions help others.

Matt :)

Loading...

Mark

# February 16, 2012

$_POST = sanitize($_POST); $_GET = sanitize($_GET);

That is the worst method of sanitizing that I have seen. You should NEVER store back in to the same data stream the sanitized inputs, the main reason is that the script can execute and a secondary submit following almost instantly can change the values of $_POST to something else other than what the inputs have been sanitized for.

This means any subsequent use of $_POST after the double post hack happens is tainted.

Storage of the sanitized inputs in to a safe variable is best option.
Operation of a white list is also a good idea, by only accepting inputs from specific POST fields, you also limit the ability of a POST hack through a variable that you may never be using.

Also… Strip_tags() exists for the same reason as

$search = array( '@]*?>.*?@si', // Strip out javascript '@<[\/\!]*?[^]*?>@si', // Strip out HTML tags '@]*?>.*?@siU', // Strip style tags properly '@@' // Strip multi-line comments );


exists.  The function removes the  elements from a string and covers any HTML that can be used to break a server or script.
My main point here is that $_POST even if you sanitize it, the variable should not be trusted, things can change, what once existed nano seconds ago may not be the case when your program comes to use the variables. Your opening yourselves to trouble if you do not use a safe method of sanitizing and using data streams.

Loading...

Andy Walpole

# February 24, 2012

For the love of god and a safe web please don’t try to write your own sanitisation methods. Use an established, peer-reviewed library.

Read Pádraic Brady’s article “HTML Sanitisation: The Devil’s In The Details (And The Vulnerabilities)” for further details: http://blog.astrumfutura.com/2010/08/html-sanitisation-the-devils-in-the-details-and-the-vulnerabilities/

Matt Auckland

# February 24, 2012

After doing extensive testing of my own method (which I posted above) using web vulnerability scanners, and running the same tests on the standard php methods, i.e strip_tags and the like, I prefer to stick to my own method.

But putting all this to one side, if any developer, customer or hosting provider truly cares about the security of there online products and servers, they would have the sense to deploy a server level unified security solution such as ASL (atomic secured linux), to protect from not only web level but also server level attacks.

I do.

Εταιρειες security

# July 3, 2012

Very interesting topic! Security issues are always interesting and useful!

Joeri

# August 11, 2012

Does this prevents SQL injections by escaping apostrophes? Have I understood that correctly?

Matt Auckland

Permalink to comment# August 16, 2012

If you use mysql-real-escape-string then yes, it would. BUT, to use mysql-real-escape-string() you must establish a database connection first, hence the reason why it is better to put any mysql-real-escape-string() sanitize function in your database class.

Loading...
Joeri

Permalink to comment# August 16, 2012

Ah yes, I always pass connection data to a mysql_* function where I can, and I have, indeed put the sanitize function inside my database class.

Thank you for the confirmation; now I can rest easily without worrying about little rascals putting a ‘); DROP TABLE valuabledata;– in there.

Loading...

CovertSystems

# August 16, 2012

Simple rules I use in a web form.

1. The form is salted with 3 things
   a) A salt value that is a hash based on the requestion IP address and a server value combined so that you create a salted string that is placed in an <input that’s hidden
  b) A honeytrap <input that is hidden that is always meant to be empty and is readonly and disabled this name is not important but I often set the name of this to the IP address of the requesting browser but do not put too much emphasis on this being reliable on return.
  c) A secondary Honeytrap <input that is hidden that is named with a juicy name like “login” that is also readonly and disabled

** I have a couple of further methods that I do use to add another onion layer to my checking and also have a form that produces a set of fields that are named randomly and can be verified also that they are present in the post form to further ensure that my server issued the form.

2. The posted form is checked so that a “Submit” element (the button itself forms part of the submit process) is present, if not, the form is rejected

3. The form hashing is then recalculated and matched and if it does not match it means that the posting IP address does not match, reject the form.

4. Check that the empty fields are empty, they are readonly and disabled but for a BOT posing as your form, it will not matter as it will stuff all input streams with data, if any data exists, reject the form.

5. If your big bad wolf appears to check out, your guard SHOULD NOT be let down… You should then move to a White List for your input streams YOU WILL ACCEPT as form inputs

6. Sanitize ONLY the WHITE LIST streams INTO an array that you then will use in your script. You should use the built in PHP functions to clean your strings as they are binary safe.

7. ONLY USE a Sanitized Array to store your DATA streams in to after they have been sanitized

8. Use the built in mysql_real_escape_string for posting into your database as this will ensure that the data posted will not break the server with a badly formatted string.

Finally….

Even if the form fails, I still issue a thank you notice for the submission and NEVER take the person back to the form page but to the website root.

Above all else, if it smells off, reject, reject, reject… Send the visitor to the website root and do as I do, have a separate database of suspect IP addresses and if your getting flooded by a particular IP address, deal with them by just dumping the form data completely and logging how many times they send data and if they get above a certain threshold, you can then have your form just ignore that IP address by issuing a 404 error page.

I will say that plenty of die hard programmers try to ridicule or punch holes in this with the age old arguments of more than one IP address and that hashes can be decoded and I would like to dispel the myth that use of Rainbow tables on IP addresses although work on raw IP addresses, how you “Creatively” hash your salt is only going to work if you use a server-side addition to the salting process. This will depend on you the programmer, using short and predictable salts in your attempt to create a hash that won’t be decode able will depend on how you do this, so if you get hacked, its down to your implementation and predictable nature.

E.G.

$server_salt = “Some incredibly long string that should be random that never changes until you need it to in order to thward a rainbow table attack on your IP address implementation is wise.”
$hash = md5( $_SERVER[‘REMOTE_ADDR’] . $server_salt );

Other methods I have implemented include the hour the form was requested, using this in the $hash that’s written in to the form checking will allow you to reject forms that are too old. which means if a bot does manage to mimic your site, you can at least limit this by time also with your hashing system.

Then you have the option to use sessions to add a further layer of paranoia to the form.

I hope that this is of use to someone, please give credit where its due.

Matt Auckland

# August 16, 2012

I use something similar, but didn’t mention it as this thread was more about sanitize data input. But here is the method I tend to use.

From the form end I use a hash which is a combination of a lengthy random salt, plus the an MD5 hash of the current day, month, year, which obviously means the final hash changes every day, all of which is SHA1 to form the final hidden hash field.

As for actual data, I have a separate validation class to verify various field types, such as email validation, password validation and so on.

Then prior to the data going into the database, I use a wrapper function that strips out all html and bbcode mark-up that isn’t on a defined white list.

Then finally in the actually query I use a modified mysql real escape string as a final level of safety.

But as I mentioned before I also run server/kernal level security suite anyway.

Matt Auckland

# August 16, 2012

Oh an one other which I add into the mix, but I only tend to use on data that won’t need decrypting, such as passwords, is to generate a random hash for each user, and that hash is used during password validation. Each hash is unique and completely random, so using a rainbow table would be no help.

And that brings me to one VERY important security point. Under NO circumstances keep user passwords in any file or database table in plain text format. It is a common mistake, and a big no no.

TLoF

# September 18, 2012

This function has many problems. not filter the onclick events, and not filter the most simple trick:

Use HTML Purifier This is library do what you want. And these guys have many years of experience in html cleanup. Just trust him.

Sample function how to use htmlpurifier.

function clean($dirty_html, $valid_html = 'b,i,u,strong,em,strike,ul,ol,li,a[href],br,p[style],div[style]')
	{
		$config = HTMLPurifier_Config::createDefault();
		$config->set('HTML.Allowed', $valid_html);
		$config->set('HTML.TidyLevel', 'medium');
		$config->set('Cache.SerializerPath', $upload['uploadpath'] . '/htmlpurifier');
		$config->set('Core.Encoding', 'UTF-8');
		$config->set('HTML.Doctype', 'HTML 4.01 Strict');


		$purifier = new HTMLPurifier($config);
		$clean_html = $purifier->purify($dirty_html);
		return $clean_html;
	}

aparna

# February 4, 2013

hi, can u please post me the sample output of sanitized data !

MikeNGarrett

# March 24, 2013

mysql_real_escape_string is being deprecated as of php 5.5 (documentation) .

I’d highly recommend replacing all your mysql stuff with the PDO method, but if you need a quick conversion use mysqli_real_escape_string (notice the i?)

SakuraNoMae

# April 8, 2013

Hey,

This tread comes up in the google index quite high when you are looking for php and sanitize input.

It’s one thing to sanitize when you go to your db, another should come right at the moment you accept user input.
This way you can directly correct the data you work with or ask for rectification from you client.

Therefore I’d like to contribute a sanitation method for php:

php.net documentation

if( $email = filter_input ( INPUT_GET, 'email', FILTER_SANITIZE_EMAIL )
{
    //the get var email was set, and now $email contains it's sanitized content
}
else
{
    //anything else you wish to do when the email should be set.
}

the filter_input and the filter_var functions offer a whole set of solutions for this.
You can sanitize input ( reform input to acceptable input) via the sanitize filters.

You can validate input ( check if input fits the required input type ) via the validate filters.

The latter gives you the option to write your own reg-exp check if needed as well as transform ‘html’ logic to ‘developer’ logic like string(‘true’) to boolean (true).

With kind regards,
SakuraNoMae

Ben

# June 13, 2013

An alternative and very simple solution is to use ‘HTML Encoding’ on the incoming text before entering DB or while rendering. No need to clean anything.

ali ogul

# February 24, 2014

great one thanks,

for those who put html in database sanitize(htmlspecialchars($input)) seems to be perfect.

and when you print you should use htmlspecialchars_decode($output);

Shubham Mathur

# September 22, 2014

I made my own fnction and use this

    function sanitize($var){
        if(is_array($var)){
            return array_map('sanitize',$var);
        }
        else{
            if(get_magic_quotes_gpc()){
                $var = stripslashes($var);
            }
            $var = mysql_real_escape_string($var);
            return $var;
        }
    }

Its similar to yours :D

Reborn

# April 22, 2018

please what is wront with this code

function sanitize($dirty) {
      return htmlentities($dirty,ENT_QUOTES,"UTF-8");
    }

This the error i get if click button

 Fatal error: Call to undefined function sanitize()

this is my code

$parent = sanitize($_POST['parent']);
   $category = sanitize($_POST['category']);

Rye Seronie

# August 29, 2018

mysql_real_escape_string has been depricated

1) Function for stripping out malicious bits

2) Sanitization function

Usage

Comments

Leave a Reply Cancel reply