Troubleshooting happens to be one of my favorite tasks, and I thoroughly enjoy the process. Routinely, I’m asked to help others already engaged in some sort of problem-solving event. In the last few months, however, I have seen more than a few examples of where troubleshooting activities served only to make the original problem much worse than it was originally. In every case, these original problems were further obfuscated by invalid troubleshooting methods, or by misinterpreted data gleaned from original attempts to repair the failure.
There could be countless reasons why attempts to solve the original problems failed, but I would argue that lack of technical prowess is nowhere near the top of the list. While I know that in today’s world the desire is for the system to tell us what’s wrong, I also know that in reality the system can often only point us in the right direction. The system only knows what it knows, and the report of a failure is usually based on information that was programmed by the controls system designers.
Additionally, in the OEM space, many of the machines are unique enough such that we don’t enjoy the troubleshooting benefit of pattern failures. Each problem seems to be unique; and the same problem on two different machines presents itself completely differently. Further exacerbating the overall problem of complexity, is that the now commonplace network-enabled devices offer more information, so we need to sort through more.
The reduced level of support in end-user facilities has created extreme time pressures on the few who champion service efforts. Given the technical and administrative pressures, I can easily understand why some folks seem to dread performing troubleshooting tasks; it’s no wonder that a system’s report of a failure is sometimes misinterpreted by those charged with acting on it.
But sometimes we’re our own worst enemies—especially so—when we’re faced with problems on systems with which we’re unfamiliar. When troubleshooting, we tend to focus on the areas we know the least about, rather than focusing on the things that are easy to eliminate as possible candidates. It is always the case that the simplest solution is the best solution; unfortunately, the simplest explanation isn’t always the correct explanation.
I believe a troubleshooter’s best friend is an insatiable curiosity—the best troubleshooters always have more questions than answers. The best troubleshooters are also sometimes wrong. Worse yet, is that the process of troubleshooting is extremely difficult to explain to some management types, especially so while it’s in-process. Nevertheless, the best troubleshooters already know this, and are totally ok with all of it.
Therefore, here are my "Top Seven" lists:
Top Seven Traits of Exceptional Troubleshooters:
- They possess a fundamental curiosity toward how things work
- They prepare by doing homework on their own
- They quickly design tools or tests to prove or disprove a theory
- They possess exceptional organizational skills, providing for quick “on-the-fly” categorization
- They are never afraid to be wrong
- They know when to say when, and they do
- They exhibit a “detective-like” approach toward things that seem unlikely
Top Seven Rules of Troubleshooting:
- Never believe everything you read, hear or see
- You cannot troubleshoot a system unless you understand how it works
- Simply making it work without proving root-cause is risky, at best
- Never assume anything; develop a proposition, then prove or disprove it
- Digital pictures are your best friend before disassembly
- A log book is infinitely superior to a great memory
- Share what you’ve learned with others; you cannot grow unless those around you grow too
Troubleshooting is simply another form of problem solving, and no single individual ever has all of the answers. Fortunately, we tend to learn the most when we’re wrong.
Finally, a quote from Albert Einstein: “Everything should be made as simple as possible, but not simpler.”