none
Win CE Nand boot failure RRS feed

  • Question

  • Hi everybody,

    I'm new on this forum and i'm a newbie in ambedded systems. This is my problem:

    I have a few platforms using WinCE on an Atmel AT91SAM9263.

    Boot starts from a Spansion NOR flash (S29GL128P90TFI) and then the eboot from a Micron NAND flash (MT29F2G08ABAEA).

    In some cases happens that after a power down, returning power on, WinCE loading from the NAND fails, leaving a permanent blank screen on my device.
    In some cases, I found several incorrect blocks into the NAND, not marked as bad blocks.

    My hypotesis is that the NAND is prone to fail often but my system is not enough robust toward this event.

    The solution, so far, was to reflash the NAND. But this practice is for me unsustainable because it means for the company to dismount the system from its site (sometimes on mountains, rivers and other difficult to reach locations), reprogram it at our warehouse and then reinstall in it original site.

    How can I make my system more reliable, assuming that i can't avoid the NAND flash read failures during boot?

    Thursday, March 20, 2014 9:31 AM

Answers

  • Assuming that your OS binary is programmed in NAND Flash and your bootloader is programming the OS binary to NAND flash. if yes, you can implement the bad bloc management in bootloader and replace it to good block during programming.

    you can have two OS in your boot loader, One is primary and another is fail safe OS. Before programming the OS binary to NAND Flash , you can compute CRC or CheckSUM and save it in a safer place ( for example 0th block of NAND Flash is normally guaranteed for 1000 time erase/write. you can keep the CRC/CheckSUM information in this block). During booting, bootloader copies the code from NAND Flash to RAM and it has to compute the CRC before jumping to OS launch. if the CRC fails, you can go for fail safe OS will boot and indicate(Show some messages on LCD) that the primary OS has corrupted and request for field firmware upgrade.

    if fail safe OS is also corrupted (CRC failed), you have to indicate the failure in bootloader it self and change itself to firmware upgrade mode.

    Firmware upgrade can be possible through USB RNDIS/Ethernet connected through TFTP server or some custom firmware upgrade machanism like USB Device firmware upgrade (DFU) Class implementation.


    Please mark as answer, if it is correct.
    Please vote,if it is helpful post.
    Vinoth.R

    http://vinoth-vinothblog.blogspot.com
    http://www.e-consystems.com/windowsce.asp

    • Marked as answer by Patriano Thursday, March 20, 2014 3:04 PM
    Thursday, March 20, 2014 10:25 AM

All replies

  • Assuming that your OS binary is programmed in NAND Flash and your bootloader is programming the OS binary to NAND flash. if yes, you can implement the bad bloc management in bootloader and replace it to good block during programming.

    you can have two OS in your boot loader, One is primary and another is fail safe OS. Before programming the OS binary to NAND Flash , you can compute CRC or CheckSUM and save it in a safer place ( for example 0th block of NAND Flash is normally guaranteed for 1000 time erase/write. you can keep the CRC/CheckSUM information in this block). During booting, bootloader copies the code from NAND Flash to RAM and it has to compute the CRC before jumping to OS launch. if the CRC fails, you can go for fail safe OS will boot and indicate(Show some messages on LCD) that the primary OS has corrupted and request for field firmware upgrade.

    if fail safe OS is also corrupted (CRC failed), you have to indicate the failure in bootloader it self and change itself to firmware upgrade mode.

    Firmware upgrade can be possible through USB RNDIS/Ethernet connected through TFTP server or some custom firmware upgrade machanism like USB Device firmware upgrade (DFU) Class implementation.


    Please mark as answer, if it is correct.
    Please vote,if it is helpful post.
    Vinoth.R

    http://vinoth-vinothblog.blogspot.com
    http://www.e-consystems.com/windowsce.asp

    • Marked as answer by Patriano Thursday, March 20, 2014 3:04 PM
    Thursday, March 20, 2014 10:25 AM
  • Thank you very much. I conjectured both the possibilities of arranging the bootloader firmware to manage multiple images for the operating system and about the checksum, but I was not sure about their actual feasibility.

    I'll check how best to implement your suggestions.


    Is there any possibility to investigate the real issues of the NAND flash? Or I have just to assume that it isn't much reliable?

    Thanks again

    Thursday, March 20, 2014 3:04 PM
  • NAND flash by its very nature will have bits flip during use, each NAND flash part will specify what its ECC requirements are or how many bits you should expect to flip for a certain amount of data and must be able to be corrected in place.

    As Vinoth has mentioned you can have two images that you store, but you can also re-write a bad OS Image when you detect that its checksum has failed.

    You mention that you found several incorrect blocks, what is incorrect about them?  When you read them back do they have bits flipped?  If so did you apply any ECC Correction to them or were too many bits flipped and the ECC reports an uncorrectable error?

    I do not know what the ECC limits for your particular NAND are but as time progresses I have seen the error correction requirements for NAND go up.

    Also, are you using the built-in ECC Correction of the Atmel AT91SAM9263 processor?  If so double check your NAND requirements as I have never seen a NAND chip that has a low enough correctable error limit to use the Atmel AT91SAM9263 processor error correction hardware.

    In terms of investigating the real issue you will need to look at it from both a hardware and software perspective.  Things to check are:

    • ECC Correction Requirements (and that you meet them.)
    • Bus Timing
    • Power Timings
    • Noise on Power Rails

    The above are not an exhaustive list, just some common things to validate.

    Good Luck,

    Brad

    Thursday, March 20, 2014 6:01 PM