:::: MENU ::::

Selenium gotcha – selenium.GetHtmlSource() returns processed HTML

Whilst writing some Selenium based acceptance tests today; I bumped into a hair pulling gotcha.  Hopefully this post will prevent you from the same pain.

The test was to check whether some tracking tag javascript was being inserted into the page correctly or not.

I assumed that I could get the page source as it was being delivered to the browser by calling selenium.GetHtmlSource(); and then check that for the javascript string I was expected.

Unfortunately, GetHtmlSource is just a proxy for the browsers DOM.InnerHTML method; and that returns the Html after it has been preprocessed by the browser.

Turns out that preprocessing does a couple of funky things, including

  • Changing line-endings (Firefox)
  • Changing capitalization (IE6)
  • Seemingly random removal / insertion of ” & ‘  (IE6)

So, when I was expecting a string like this:


<!--
   var amPid = '206'';
   var amPPid = '4803';
   if (document.location.protocol=='https:')
...[snip]...

IE6 was presenting me with:


<!--
   var amPid = '206'';
   var amPPid = '4803';
   if (document.location.protocol=='https:')
...[snip]...

A possible solution is to ignore case, whitespace and quotes when doing the comparison, with a helper method like this:

/// 
        /// Use this to compare strings to those returned from selenium.GetHtmlSource for an Internet Explore instance
        /// (IE6 seems to change case and inclusion of quotes, especially for Javascript.?)
        /// 
        /// 
        /// 
        private static void AssertStringContainsIgnoreCaseWhiteSpaceAndQuotes(string expected, string actual)
        {
            string expectedClean = Regex.Replace(expected, @"s", "").ToLower().Replace(""","").Replace("'","");
            string actualClean = Regex.Replace(actual, @"s", "").ToLower().Replace(""", "").Replace("'", "");
            StringAssert.Contains(expectedClean,actualClean,
                                  string.Format("Expected string nn{0} nnis not contained within nn{1}", expected, actual));
        }

It was the line endings that really floored me; because they were automatically normalized/corrected by my test runner when displaying the error. Aaargh!


7 Comments

  • Reply navneet |

    Hi David,
    Thanks for this post.
    I have one issue with text which i am checking through selenium.GetHtmlSource();
    Application scenario:
    -We are doing automation testing for our sites.
    -When we submit page after that on new page we are checking for pixel entries.
    like this ” or tags
    -so i want to detect that using selenium.GetHtmlSource();

    Firefox 3.5.7 and IE 7
    1) Using view source mannualy (right click –> view source)
    Orignal on Firefox :
    Orignal on IE :

    After selenium.GetHtmlSource();
    In Firefox :
    IN IE :

    Upper Lower case and double quote is fine to handle.
    But how to handle attribute sequence on .

    Thanks,
    Navneet

  • Reply navneet |

    Firefox 3.5.7 and IE 7
    1) Using view source mannualy (right click –> view source)
    Orignal on Firefox :<surehits account=”167644″ sid=”navneet_12_17_Fast” />
    Orignal on IE :<surehits account=”167644″ sid=”navneet_12_17_Fast” />

    After selenium.GetHtmlSource();
    In Firefox :<surehits account=”167644″ sid=”navneet_12_17_Fast” />
    IN IE :<SUREHITS sid=”navneet_12_17_Fast” account=”167644″ />

    • Reply mrdavidlaing |

      @Naveet,

      Best I can suggest is to do multiple searches – once for tag (SUREHITS), then again for first attribute (sid=”navneet…) and again for next attribute etc.

  • Reply sky |

    You might also notice that some browsers, FF in particular, strip comments from javascript.

    This behavior bit me in the rear a few years ago while writing some javascript code generation libraries that used comments as meta data.

  • Reply Markus |

    Maybe it’s me (…and quickdiff.com), but both your expected and presented code snippets are equal ;)

So, what do you think ?