Your browser either does not support Javascript or you have it disabled. Please enable Javascript to be able to navigate our site and utilize features.

Montavista


Vile defect most evil!

May 19th, 2008

[ This is first in a series of some of the most unusual, challenging, or just plain odd support engagements I’ve observed over they years. Names and companies have been changed to protect the innocent. ]

CASE #1-M6P7/1771

LOCATION: New Jersey

(dun-DUH)

Customer reported a periodic hang when booting their system. The only direct symptom was the that the last kernel message displayed was: “Starting kswapd v1.8″. The behavior seemed to be a heisenbug since the frequency of the hang was only once every 10 boots. Some hardware hung more, some hung less. At times hitting the physical reset button recovered the system. One particular hardware instance would hard-lock resisting all recovery methods unless the system was hard power cycled by removing the AC plug.

Uh oh… that’s not good. Was the software so mucking the hardware up so much that a cold power cycle was the only thing that would fix it?

The customer jumped into a range of experiments adding and removing various hardware components and altering the amount of “power-off rest time” between test cycles. The results were conclusive: There is no one thing that made the situation better. Tweak the GigE, SMP, RT scheduler, boot device, nothing mattered.

On the MontaVista side we had been discussing this issue in depth with the customer all along and suggesting various tests to try. We also confirmed that we couldn’t reproduce the issue on the hardware that we had in house. This was an issue confined to the customer’s specific brand of x86 motherboard that they purchased. We were concerned because we sometimes see odd hardware attributes on custom hardware but when a commercial product is used the incidents of hardware weirdness are far less frequent.

Most of our customers are using hardware that MontaVista doesn’t have in our lab. That’s the nature of the embedded software industry. This imposes some natural constraints… we can’t kill defects that we can’t reproduce or observe reliably. Sometimes the only thing that can help is to get the customer’s hardware in-house.

FedEx to the rescue. What happened next was quite a surprise to me.

(dun-DUH)

TO BE CONTINUED.

2 Responses to “Vile defect most evil!”

  1. Joe Says:

    Can’t wait to hear the rest of the story - I bet you it was a bug (was the area invested with mosquitoes at the time?)

  2. Brad Says:

    I’ll pull this to the top of the queue. When you here the conclusion you’ll wonder how the support engineer figured it out.

Leave a Reply

Developer Resources
Contact Us      Careers      Resource Download Library      Meld Community      Request Information            Feeds of news, blogs, and more

©2010 MontaVista Software, LLC. All Rights Reserved