Ask a questionAsk a question
 

AnswerHopper testing - and its random nature

  • Wednesday, October 21, 2009 5:22 PMbbjbbj Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hopper testing is all very well, but by definition its testing random things.

    Whilst a valid style of test we really need some kind of control over the randomness, i.e. make it reproducible so we can a) find bugs and b) test what we think is a fix is really a fix - by re-running the same hopper session.

    So
    i) Does anyone know how to re-run hopper such that it generates reproducible testing? - i.e. presumably how to seed its random number.
    ii) Whilst MS testing will test for 2 hrs - given its a fairly random test, does anyone have any guidelines as to how long we should run it so a 2 hrs session is likely to pass ?. We typically hopper test for about 4 hrs, but one of our apps failed the MS testers Hopper test....
    iii) Anyone know how to stop hopper test app 'normally' - you seem to have to crash the emulator/device to stop it.
    iv) Given hopper is also testing the rest of the system - even if you remove as much as possible - e.g. all Start items etc - what happens if there is a problem with any other part of the WM emulator/device + that dies - does that count as a Hopper failure in MS testing?

    •  

Answers

  • Thursday, October 22, 2009 2:35 AMM FrancisMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Whilst a valid style of test we really need some kind of control over the randomness, i.e. make it reproducible so we can a) find bugs and b) test what we think is a fix is really a fix - by re-running the same hopper session.
    So
    i) Does anyone know how to re-run hopper such that it generates reproducible testing? - i.e. presumably how to seed its random number.
     [Mike Francis] There is an excellent blog article on this topic here: http://blogs.msdn.com/hopperx/archive/2005/08/24/455572.aspx

    Here is an excerpt:
    Let me explain what *is* possible: Strictly speaking, you can reproduce any particular Hopper run by using its random seed as a parameter (-sxxx) when you invoke the test. Hopper derives its randomness by calling rand() which is completely predictable if you to know its seed. All Hopper actions are bound by this rand() functionality so keystrokes (and screen taps) sent to the device from one seed run will reproduce 100% to a similarly seeded device.

     

    You also have the ability to use the Verbose (-v) flag which will print each key being sent to the device in your debug output. I have heard stories of saving this output to a file and then writing a simple input program that can ‘replay’ this file. But again, if you have the previous log, this isn’t really necessary since you can always reply the events with Hopper using the previous seed.

     

    Next, why it doesn’t work: In general replaying Hopper runs longer than a few minutes will NOT reproduce the same run and will give you different results. This is frustrating because we want Hopper to work this way and it certainly seems like it should, but it doesn’t.

    Hopper sends its keystrokes and screen taps directly to GWES as fast as the system will allow and often the UI will be trying to catch up. GWES is busy trying to process the keys and the UI is busy trying to keep up. As widows are being created and destroyed, the timing of each input is critical and easily missed - a keystroke intended for one window is actually sent to another. Once this happens just once - your run has been altered and you are no longer on the same path.

     

    ii) Whilst MS testing will test for 2 hrs - given its a fairly random test, does anyone have any guidelines as to how long we should run it so a 2 hrs session is likely to pass ?. We typically hopper test for about 4 hrs, but one of our apps failed the MS testers Hopper test...
    [Mike Francis] It’s hard to put a number on that –but in general the longer you application will run under Hopper the more confidence you can have that in the certification test it will pass.

    iii) Anyone know how to stop hopper test app 'normally' - you seem to have to crash the emulator/device to stop it.
    [Mike Francis] Again the HopperRx blog to the rescue. See here: http://blogs.msdn.com/hopperx/archive/2006/11/30/oh-please-make-it-stop.aspx

    iv) Given hopper is also testing the rest of the system - even if you remove as much as possible - e.g. all Start items etc - what happens if there is a problem with any other part of the WM emulator/device + that dies - does that count as a Hopper failure in MS testing?
    [Mike Francis] This is a possibility, which is why the FocusApp is important; to constantly bring your application to the foreground. Typically you can tell if the error is caused by your application or the OS.

    Thanks,
    Mike

All Replies

  • Wednesday, October 21, 2009 8:47 PMjaybo_nomad Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    iii)  Here's a way to stop hopper:

    http://blogs.msdn.com/hopperx/archive/2006/11/30/oh-please-make-it-stop.aspx

    Hopper is perhaps my favorite example of painfully obtuse development tools.  Maybe Scott Hanselman can hook up BabySmash to Hopper to improve usability.
  • Wednesday, October 21, 2009 10:50 PMGousekhan-MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi Bbjbbj,

    i)
    Refer Link : http://blogs.msdn.com/hopperx/about.aspx

    Hopper stresses the entire device and will execute anything accessible through the UI many, many times. It has no knowledge of where it is at any time and has limited ability to detect poor system health. Hopper executes randomly, thus different bugs might be encountered each time the tool runs.

    ii)
    The application must complete two hours of Microsoft’s Hopper test without exhibiting unpredictable behavior, hanging or crashing.

    Refer Link : http://social.msdn.microsoft.com/Forums/en-US/mktplace/thread/04fedfaf-a749-4f95-9db9-054d29dfd8e6

    for Steve Post as below

    Note that the correct Hopper version number is 2.0.24.4074 and this supercedes what is listed in the Application Submission Criteria. All Windows Marketplace applications are required to pass two hours of Hopper testing.

    Developers are highly encouraged to test their applications in-house, with Hopper and Application Verifier, prior to submitting applications to the Windows Marketplace. It is also advised that developers understand the Windows Marketplace Application Submission criteria, available here, and are comfortable that their applications will pass testing prior to starting the application submission process.

    refer Link : http://social.msdn.microsoft.com/Forums/en-US/mktplace/thread/571231d7-2b98-4cf3-9ed5-c4275cf905bf

    for Steve and Mike post

    Hope this helps you.


    Thanks,
    Gouse

  • Thursday, October 22, 2009 2:35 AMM FrancisMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Whilst a valid style of test we really need some kind of control over the randomness, i.e. make it reproducible so we can a) find bugs and b) test what we think is a fix is really a fix - by re-running the same hopper session.
    So
    i) Does anyone know how to re-run hopper such that it generates reproducible testing? - i.e. presumably how to seed its random number.
     [Mike Francis] There is an excellent blog article on this topic here: http://blogs.msdn.com/hopperx/archive/2005/08/24/455572.aspx

    Here is an excerpt:
    Let me explain what *is* possible: Strictly speaking, you can reproduce any particular Hopper run by using its random seed as a parameter (-sxxx) when you invoke the test. Hopper derives its randomness by calling rand() which is completely predictable if you to know its seed. All Hopper actions are bound by this rand() functionality so keystrokes (and screen taps) sent to the device from one seed run will reproduce 100% to a similarly seeded device.

     

    You also have the ability to use the Verbose (-v) flag which will print each key being sent to the device in your debug output. I have heard stories of saving this output to a file and then writing a simple input program that can ‘replay’ this file. But again, if you have the previous log, this isn’t really necessary since you can always reply the events with Hopper using the previous seed.

     

    Next, why it doesn’t work: In general replaying Hopper runs longer than a few minutes will NOT reproduce the same run and will give you different results. This is frustrating because we want Hopper to work this way and it certainly seems like it should, but it doesn’t.

    Hopper sends its keystrokes and screen taps directly to GWES as fast as the system will allow and often the UI will be trying to catch up. GWES is busy trying to process the keys and the UI is busy trying to keep up. As widows are being created and destroyed, the timing of each input is critical and easily missed - a keystroke intended for one window is actually sent to another. Once this happens just once - your run has been altered and you are no longer on the same path.

     

    ii) Whilst MS testing will test for 2 hrs - given its a fairly random test, does anyone have any guidelines as to how long we should run it so a 2 hrs session is likely to pass ?. We typically hopper test for about 4 hrs, but one of our apps failed the MS testers Hopper test...
    [Mike Francis] It’s hard to put a number on that –but in general the longer you application will run under Hopper the more confidence you can have that in the certification test it will pass.

    iii) Anyone know how to stop hopper test app 'normally' - you seem to have to crash the emulator/device to stop it.
    [Mike Francis] Again the HopperRx blog to the rescue. See here: http://blogs.msdn.com/hopperx/archive/2006/11/30/oh-please-make-it-stop.aspx

    iv) Given hopper is also testing the rest of the system - even if you remove as much as possible - e.g. all Start items etc - what happens if there is a problem with any other part of the WM emulator/device + that dies - does that count as a Hopper failure in MS testing?
    [Mike Francis] This is a possibility, which is why the FocusApp is important; to constantly bring your application to the foreground. Typically you can tell if the error is caused by your application or the OS.

    Thanks,
    Mike

  • Thursday, October 22, 2009 7:46 AMbbjbbj Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    With respect Gouse

    We know the above, if you read our question that should be obvious.

    To repeat the problem:
    We really need to REPRODUCE the same Hopper run once a bug has been found and possibly fixed to see if the fix applied works. Sometimes this can be done manually, sometimes its not very easy unless you reproduce the same sequence of key presses - hence requirement for reproducability.

    Because of the random nature of Hopper testing - just because an app passes 2hrs in MS testing does not mean its found any/all the bugs - it just means you dont happen to have hit any. This style of testing invaribly needs > test houses time running (in this case >2hrs) prior to being submitted to a test house to have any level of confidence it will pass the 2 hrs. The question was does anyone have a feel for how long that should be. Our rule of thumb is twice the test houses test time - but we dont know the details of Hopper particularly well.

  • Thursday, October 22, 2009 8:17 AMbbjbbj Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Mike

    Thanks for the info, however from the blog it also says:

    /////////////////////////////////////////

    Lastly, what you can do about it: There are several ways you can deal with this issue, but my best advice is: don’t. The best strategy is to evaluate breaks, hangs and crashes as they stand and not necessarily worry about how they happened and focus on the fact that they happened. Microsoft has a dedicated team solving stability bugs found by Hopper and we don’t use it. Instead we focus on other strategies and clues left by the system (in fact that is the focus of this blog).

    Challenge yourself to understand the need of the repro case – is it absolutely required to solve this problem? Are you absolutely certain that getting a repro case will shed light on the problem? It is often easy to think that a repro case will help, but my guess is that it will simply take a lot of resources to get there and the additional information you have gathered won’t be worth the effort.

    But, if you must pursue the repro case, it might be possible to repro using the same seed (-sxx). Hopper has another feature that puts it under lazy mode and allows you to slow down (-lxx) inputs so that timing problems are less likely to occur. Unfortunately this often has the side-effect of not finding the original bug and can lengthen the Hopper runs considerably

    /////////////////////////////////////////

    which basically says you are ******** trying to find bugs with Hopper as
    i) you cant reproduce the tests and
    ii) you have little way of knowing how/where the bug exists should one be hit 
    iii) the text suggests that you basically think a bit about the problem and decide if its important to fix or not depending on how long it might take, what the consequences are if its not fixed etc.

    The last point here strongly suggests a waiver is the way to go forwards. Conversely, submitting with a waiver seems to defeat the entire point of the testing.

    In our case:
    i) we tested with Hopper as usual before submitting to MarketPlace - no problem for 2 hrs.
    ii) app failed MS Hopper testing - so submission failed. Fair enough.
    iii) we re-tried it here and it has failed this time.
    iv) we think we have found an obscure problem + fixed it.
    v) we have no idea if this is the bug MS Hopper testing has found as we have no way of replaying the testing to see if the issue is fixed.
    vi) the only information we can provide is that the app ran for about 6 hrs under hopper here, trouble is that has little relevance to any other Hopper test run.

    Simply re-submitting the app for MS testing on the off chance it may pass this time seems pretty unscientific.

    So how do we proceed ?


  • Thursday, October 22, 2009 7:48 PMM FrancisMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi bbjbbj -

    I understand what you are saying, that you found a problem using hopper, but you are not sure if the problem you found was the same found found by NSTL. If the errors are the same, in most cases, the hopper logs should look similar at the point of failure. It is also possible that there are two (or more) distinct bugs in your code, one found by NSTL, the other found by you.

    This is why it is a best practice to run Hopper as part of your development process, where you are regularly running Hopper for an extended period of time - overnight works well - to help flush out any lurking issues.

    About all you can do in this case, is to run hopper enough times to where you feel confident it will pass when going throught certification.

    Thanks,
    Mike
  • Friday, October 23, 2009 8:28 AMbbjbbj Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Mike

    Thanks for the info.

    We do hopper test regularly, the bug we fixed was a side effect of a fairly late change in code. Would never try to claim we dont have any bugs, (only that we are currently unable to locate any obvious ones!), which is why we would really like to know if the bug we have found/fixed is the same as the NSTL one. Some kind of reproducible testing regime would at least give us the confidence that the NSTL bug has been squashed.

    Can I suggest that MS and/or NSTL at least include the Hopper log file(s) as part of the test report when Hopper test fails. (No our failure report did not contain any attached file or inline cut/paste from any Hopper Log).

    Thanks
    John
  • Friday, October 23, 2009 6:26 PMM FrancisMSFTUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    bbjbbj -

    I will follow-up with your suggestion. That makes a lot of sense.

    Thanks,
    Mike
  • Tuesday, November 03, 2009 7:47 AMP. Mendes Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I know this is off-topic, but as nobody can give me a straight answer in my topic I thought I could ask this here. 
    There's no clue about the time interval of the FocusApp. By default it's set to 10 seconds, so should I consider this to be the interval that's being used to certificate the apps?

    Thanks,
    Pedro
  • Tuesday, November 03, 2009 9:02 AMbbjbbj Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Pedro

    Since the Hopper test is non reproducible, its not at all clear that the test is particularly valid at the best of times, let alone what params MS testers choose to use.

    At a guess,  they are using the default 10 secs. We use the default 10 secs and our apps generally pass MS Certification.

    Hopper docs strongly suggest you remove the 'exit' functionality of your app during testing before any release - which makes the focus app time more or less irrelevant as far as we understand it.

    Dont know exactly what you are observing at a different rate to 10 secs, but an alternate test criteria is that apps must not be unresponsive for any significant period of time without displaying the busy cursor.