2

Closed

Carriage return encoded as numeric character reference

description

I have set the encoderType to "System.Web.Security.AntiXss.AntiXssEncoder,System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" for my project which is a simple ASP .NET 4.5 web forms project. When I include new lines in a text box the new line renders as the numeric character reference which is incorrect HTML5 according to http://validator.w3.org/ .See the attached file for the mark up.

file attachments

Closed Jun 3 at 1:45 AM by bdorrans
WebForms bug rather than AntiXSS

comments

sean986 wrote Dec 18, 2012 at 10:40 AM

Thank you for the swift reply. The page already had controlRenderingCompatibilityVersion set to 4.5. I tried reverting to controlRenderingCompatibilityVersion 4.0 and this added an extra line break to the mark up and encoded it which made the problem even worse (see attached file).

If the encoding type is not set to the AntiXssEncoder then the line breaks are not encoded so the page renders as valid HTML5 (see attached file).

bdorrans wrote Dec 18, 2012 at 11:31 AM

This is, I'm afraid, a known webforms bug.

You can get the correct behavior by reverting how webforms renders. You do this by setting the following web.config attribute on the pages element;

<pages controlRenderingCompatibilityVersion="4.5" />

You can read more about this setting at http://msdn.microsoft.com/en-us/library/system.web.configuration.pagessection.controlrenderingcompatibilityversion.aspx



** Closed by bdorrans 12/17/2012 9:07 AM

sean986 wrote Dec 18, 2012 at 11:31 AM

bdorrans wrote Dec 19, 2012 at 11:30 AM

It is still a web forms bug, rather than an AntiXSS bug. AntiXSS doesn't know who is calling it. It emits correct encoding for non-textbox value HTML encoding, so without adding another HtmlForTextBoxEncode, or adding a Boolean, and then changing web forms, any change would veer from the spec.

I've punted this over to the webforms folks, but I'm closing it as not a bug.



** Closed by bdorrans 18/12/2012 11:24

sean986 wrote Dec 19, 2012 at 11:30 AM

Sorry to re-open this again but I think I may be confusing you with the text box example. I am certainly not asking for a special case for textbox values.

The problem is that HtmlEncode is encoding a new line as a numeric character refence of a carriage return and the numeric character refence of a new line. The carriage return is one of the space characters which is not allowed to be represented by a character reference in html5 according to http://www.w3.org/TR/html5/syntax.html#character-references . If HtmlEncode could be updated to never output the character reference of the carriage return that would mean when the text in my example was encoded html would be valid.

sean986 wrote Dec 19, 2012 at 11:33 AM

In the attached HtmlEncodeWithoutTextbox.PNG there is an example which doesn't use a text box, hopefully that will help clarify.

bdorrans wrote Dec 19, 2012 at 1:35 PM

Ah got you. This is new for HTML5.

That complicates things, AntiXSS has no idea of HTML versions, so you'd have to pass it through via a parameter, which means changes in how it's called by the ASP.NET et al.

I'll go talk to the ASP.NET folks to see what ideas we can come up with

sean986 wrote Dec 20, 2012 at 2:59 PM

Great, thanks for talking to the ASP.NET folks.

Although this is a standards problem in HTML5 I think it could be fixed universally without any adverse effects. This would save having to pass an HTML version through a parameter.

If the encoder ever finds a carriage return followed by a new line could it remove the carriage return rather than encoding it? This would allow HTML5 to be correct but shouldn't change the meaning in other HTML versions.