x_ prepended to invalid (edit: and valid) HTML class attribute values?

Nov 6, 2009 at 2:29 PM

Hi,

I am finding (correct me if i'm wrong) when html is sent through GetSafeHtmlFragment(), if the class attribute value does not have comma's around it, the value is prepended with x_

-thanks for any help you can give me

Alex.

For example:

 

<P class=pageText>asdasd</P>.  


Was sanatized into

<p class="x_pageText">asdasd</p> 



 

 

 

Jan 6, 2010 at 9:37 PM

Hey All

I'm having the same issue that alexkey identified.  What is the purpose of prepending the x_ to the class name?

I see where this is defined in the version 3.1 code (HtmlToHtmlConverter.cs on line 102):

private static readonly string NamePrefix = "x_";

and see the 4 places in that class it is used.

I pulled down a copy of the code and can remove it, but dont' want to do so if there is a good reason for sanitizing the HTML attribute?

Thanks

Jeff

 

 

Jan 19, 2010 at 12:17 PM

Hi Jeff,

I'm back on trying to figure this out, any luck?

Jan 19, 2010 at 1:45 PM
Edited Jan 19, 2010 at 4:56 PM

Hi,

I have updated our WYSIWIG to output better formed HTML, so we now have attribute names properly quoted e.g. <p class="pageText">asdasd</p>

However i'm still having issues....

From what I can tell, the reason x_ is prepended to an attribute is if the attribute filter rule (attrAction) is set to "PrefixNameList"

There is a table type structure called "HtmlFilterData.filterInstructions" which contains all the "rules" for filtering.

The action is set in the lines below, notably this.filterForFragment is set to true when using GetSafeHTMLFragment()

 

attrAction = this.filterForFragment ?
HtmlFilterData.filterInstructions[(int)attribute.NameIndex].attrFragmentAction :
HtmlFilterData.filterInstructions[(int)attribute.NameIndex].attrAction;

 

When passing in <p class="pageText">asdasd</p>, when using the debugger I can see "attribute.NameIndex = Class"

And the actions are as follows: 

attrAction = Keep
attrFragmentAction = PrefixNameList

So it looks as though the code is designed to prepend x_ for attribute "class"

Thougth i'm not sure if it is just for <p> tag - however quickly testing for <div> and <span> has a similar result.

Is this a misconfiguration of filterInstructions or is there a reason for this behaviour?

-thanks
Alex.

Jan 19, 2010 at 4:55 PM

I may have the wrong end of the stick - knowing me more than likely :)

But i've opened a bug report #11555

Jan 19, 2010 at 5:28 PM

I have just tried this in the Web Protection Library v1.0 CTP : Sanitizer.GetSafeHtmlFragment() with the same result.

Jan 19, 2010 at 6:23 PM

hey alexkey -- in response to your earlier post, I am currently working around it by making a code change to the HtmlToHtmlConverter class:

here's the code i changed -- line 100:

private static readonly string NamePrefix = "";//TODO removed until we determine why prepending this to attribute names is a good idea -- was: "x_";

I would be very interested as well to find out from the original authors the reason to prepend x_...

cheers

jk

Jan 20, 2010 at 9:57 AM
jk wrote:

hey alexkey -- in response to your earlier post, I am currently working around it by making a code change to the HtmlToHtmlConverter class:

here's the code i changed -- line 100:

private static readonly string NamePrefix = "";//TODO removed until we determine why prepending this to attribute names is a good idea -- was: "x_";

I would be very interested as well to find out from the original authors the reason to prepend x_...

cheers

jk

 

Thanks for letting me know JK.

The approach I am thinking of doing is on lines 125, 161 of HtmlFilterData.cs

new FilterActionEntry(FilterAction.DropKeepContent | FilterAction.IgnoreAttrCallbacks, FilterAction.DropKeepContent | FilterAction.IgnoreAttrCallbacks, FilterAction.Keep, FilterAction.Keep),    //HACK: Was previously "FilterAction.Keep, FilterAction.PrefixNameList" but it was appending x_ to HTML class attribute values, this is a temporary fix until I can learn more.


 new FilterActionEntry(FilterAction.DropKeepContent | FilterAction.IgnoreAttrCallbacks, FilterAction.DropKeepContent | FilterAction.IgnoreAttrCallbacks, FilterAction.Keep, FilterAction.Keep), //HACK: Was previously "FilterAction.Keep, FilterAction.PrefixName" but it was appending x_ to HTML ID attribute values, this is a temporary fix until I can learn more.

The only things I can think of, for the reasons to do preffixing are:

  1. Perhaps to stop malicous people conflicting with ID's of other HTML elements on a website - however i'm sure web broswers will handle conflicting ID's ok. And this wouldn't be a problem for class names.
  2. Use the prefix as an optimised (I understand the library has had loads of performance improvements) way of keeping track of attribute values it has already scanned.
Jan 20, 2010 at 10:09 AM

I just did some tests upon the theory of "Perhaps to stop malicous people conflicting with ID's of other HTML elements on a website etc." and uncovered this:

When passing in:

 

<style type="text/css">
.bob{color:red;}
</style>
<p class="bob">hello</p>

 

It will result in something like this:

 

<style type="text/css">
<!--
.x_bob
	{color:red}
-->
</style>
<div>
<p class="x_bob">hello</p>
</div>

 

This is a technique i've seen used by agency web designers if they don't want to conflict with styles already defined by a CMS.

So it looks as though GetSafeHTMLFragment might be expecting you to pass in the actuall CSS, if you define class names and ID's.

Which makes me turn to GetSafeHTML() (as it doesn't prepend x_ to class names id's etc), however when reading the help guide and this forum post

I get the impression it is meant to be used for entire HTML pages, not html fragments such as input from a WYSIWIG.

If perhaps GetSafeHTMLFragment() expects CSS to be passed in with it....

I can't speak for many WYSIWIG editors but the ones I have used have predefined CSS classes to choose from i.e. they are in a seperate stlye sheet and would not be passed to GetSafeHTMLFragment()

I'd be interested to see what people think of my theory?

Jan 25, 2010 at 11:06 AM

Perhaps the library should only prepend x_ if it detects CSS styles being defined.

And if the CSS isn't being passed in, do not prepend x_ , so not to break the use of existing CSS styles.