1
Vote

Sanitizer.GetSafeHtmlFragment() doesn't work correctly with 1084 Unicode symbol

description

Sanitizer.GetSafeHtmlFragment('м') returns "&# 1084;"

Seems that problem is in this class

Microsoft.Exchange.Data.TextConverters.HtmlWriter

bool IFallback.IsUnsafeUnicode(char ch, bool isFirstChar)
{
        return this.filterHtml &&
            ((byte)(ch & 0xFF) == (byte)'<' ||
            (byte)((ch >> 8) & 0xFF) == (byte)'<' ||

            (!isFirstChar && ch == '\uFEFF') ||
            Char.GetUnicodeCategory(ch) == System.Globalization.UnicodeCategory.PrivateUse);
}

(byte)(ch & 0xFF) == (byte)'<' 
returns TRUE for 1084 code

comments

bdorrans wrote Jan 28, 2015 at 2:16 PM

What would you consider the correct response here? The unencoded character?

Regardless, the sanitizer is no longer supported, and if it's just you'd rather it was unencoded that's not going to get addressed, as the encoded value is still correct.