Quick SummaryWhat came first? Chicken or egg?
Is Root Cause Failure Analysis Reactive Or Proactive?
Although Root Cause is both reactive and proactive, I believe that in Root Cause we need to be reactive first in order to be proactive. Root Cause Failure Analysis can only be performed if there is a problem or a failure will take place. Nobody can perform a Root Cause Failure Analysis if there is no failure that eventually happens. A failure must first occur before a Root Cause Failure Analysis can be performed.
Without a failure then how in the heck can we perform a Root Cause Failure Analysis? Likewise, the failure must be fresh, meaning it is highly unlikely to perform a Root Cause Failure Analysis if the failure happened a year ago or 6 months ago. Why? Because to perform a Root Cause Failure Analysis, the evidence must be frozen.
One of the most difficult parts in doing Root Cause Failure Analysis is that eventually when something fails on the equipment, there are actually two teams that will respond to the failure. The first team is the restoration team or simply known as the fast team. These people are fast, they need to restore the equipment as fast as they could since productivity will always be the utmost priority. The longer the equipment is restored rest assured that operations people will be watching you at the back whispering how much more time is needed to fix the equipment.
On the other hand, we have the slow team. These people are the ones who will investigate and probe on how and why the problem or failure occurs on the equipment in the first place. The problem is that when the slow team arrived. Everything is restored and the part that eventually failed had been thrown away for good and every bit of evidence had been washed up and destroyed by the fast team or restoration team. The question is, how in the world can you perform a Root Cause Failure Analysis in this situation. You simply just can’t.
When Is Root Cause Reactive
Root Cause is always reactive at the start since a failure needs to be in place before anyone can perform an investigation. Example, a ball bearing had failed that caused an equipment to stop, there was no stock in the stockroom for the part, and hence downtime was enormous. Management warranted an analysis why the bearing had failed so an investigation team composed of a Principal Investigator and Evidence Gathering Team had been formed to investigate the failed phenomenon. We can only perform an analysis when something had failed or a problem erupts that causes the operation to halt. What the team is interested is to understand the underlying causes of the problem so that the people involved can learn from the things that go wrong.
I recall one time; I was called to a plant to provide a brief presentation on their management team about Root Cause Failure Analysis since one of their leading customers got very irritated with them. An equipment failed that eventually caused a delay in their shipment. When the customer asked for the reason for the delayed they said that a bearing failed in the equipment that is processing the units which eventually stopped the equipment. When asked what they did, they told their customer that they replaced it with a new one. The customer almost pulled out their business with this plant because they do not have an answer on why the bearing failed in the first place. As I was presenting, someone interrupted me since what they want is a system or strategy where they can proactively act or capture all “ALL” failures that can occur on this equipment. When I told them that Root Cause was not designed to handle all failures that can occur on this equipment I lost that opportunity.
In my mind, these people do not know what Root Cause Failure Analysis is all about. It is not a something that will address “ALL” the problems of the equipment but rather it are used to understand the cause of failure that actually occurred in the equipment. You see, the people above wanted to address every bit of probably cause that can occur on the equipment, I think what they need was some sort of FMEA and not RCFA. The difference between the two is that FMEA is designed to capture the probable cause of failure and prioritize the failure according to its severity while RCFA is designed to address the true cause of failure based on the evidence unfolded so that we can learn from the things that go wrong.
When is Root Cause Proactive?
First, Root Cause Failure Analysis can only be Proactive if we can finally learn and understand that big problems are just an accumulation of small problems that had been neglected in the first place. Most Root Cause initiatives and efforts seem to address catastrophic and big failures. Why don’t we analyze the problems when it is still small. One of the leading Root Cause providers in the US in which I am affiliated, which is Failsafe Network considers in great length the importance of small problems. One of the most memorable things I’ve learned from Failsafe training is that big things go wrong because we do not act on the small things and waiting to do a Root Cause Failure Analysis on big problems will just assure us a continuance of big problems. Hence, Failsafe recommends to perform root cause not only on MAXI or big events but also on MINI and MIDI or small events Their training “The Latent Cause Analysis Experience” believes that our focus should be on the small problems, since big problems can only be taken cared off if we take care of the small problems. Isn’t it that TPM also had the same thing in mind that when we take care of the basic equipment condition, then the life of the equipment can be prolonged.
Second, most people say that performing a Root Cause Failure Analysis will help us prevent a recurrence of the problem. Let me rephrase this, I think it is much more likely that performing a root cause failure analysis will help us prevent the recurrence of the cause of the problem. When we have taken care of a single or couple of causes, there is a probability that the failure can occur again in the future due to a different cause. Root Cause is not design to address every single possible cause that eventually can cause the problem to occur but rather we are only interested in the cause of the problem based on the evidence unfolded.
Now the analysis shows traces of silicon contaminants as well as metal wear on the raceway of the bearing causing the bearing to suffer from fatigue and spalling, and the investigating team saw that upon conducting an Oil Analysis, the oil was purely contaminated, particle count indicating their ISO range was way out of the standard. Good filtration practices were provided as well as contamination control practices were adopted, not only on this equipment but to the rest as well. How lubricants were stored was addressed and so on. But all these efforts were done as a result of finding out the cause of the failed bearing. After initiating this practice, they experience a smooth operations as a result of their proactive efforts, not only did they address that single equipment that failed, but also other equipment that could probably fail as well as a result of similar incident.
Third, Root Cause Failure Analysis can only be proactive if we can learn from the failure itself. This is easy to say but in the real world it is much more difficult. There is always a deeper underlying cause behind the physical and human cause which is the latent cause of the failure itself. Many industries think that when they find out the physical cause of the problem they stop only to realize that the failure will revert back again to them in a time that is least unexpected. Leo Tolstoy quote, everyone thinks of changing the world but no one thinks of changing himself. The late “King of Pop”, Michael Jackson in his song “Man in the Mirror” sang “If you want to make the world a better place, take a look at yourself and make that change.” Latencies are hidden causes that needs to be exposed. It is not only about system causes, rules and procedures that we ought to follow but it is about the way I am that have contributed to the problem and about the way we are that contributed to the problem itself.
Let me give you some samples of these Latencies, to understand the latencies, we need to answer the following questions:
What is it about the way I am that contributed to the problem? (About you)
What is it about the way we are that contributed to the problem? (Organization)
Space Shuttle Challenger exploded because the o-ring on the right solid rocket booster leaked which came in contact to the tank. The Roger Commission concludes that the cause of the explosion was caused by the faulty design of the solid rocket booster joints. Morton Thiokol engineers already know this problem and had informed their management about this. However, Morton Thiokol management team also knows that their billion dollar contract with NASA was nearing to an end and that they need to be in business. Richard Feynman, a novel prize winner, physicists and part of the Roger Commission that probe the investigation of the Challenger Disaster said that both engineers and managers were not communicating effectively. NASA was not communicating well with its suppliers.
An electrician rewinded the motor backwards. The motor was set up in a critical location on the plant. Management decided to discipline and suspend the electrician without pay for a month which eventually wrecked the turbine. Later on a deeper probe into the investigation, the team found out that the electrician had already been working for 24 hours straight because the person who should relieved him was on sick leave. Was it really the fault of the electrician?
The team probing the failure of the pump found out that misalignment causes the pump to fail. Likewise, they disciplined the person who performs the misalignment. When the investigative team probes on to the deeper cause of the problem, they found out that this person was only using his eyesight to perform alignment. There were no instruments being used, no training was ever provided to this guy. In fact they learned that the instrument had already been requisitioned 5x but was disapproved by higher management 5x due to cost reduction measures. Was it really the fault of the person who performed the alignment or was there a much deeper cause?
A Plant manager complained to the maintenance about the leak in the floor and the maintenance told him that it was due to a gasket and showed their records that for this month they have replaced this gasket 5 xs and had already informed the purchasing department about this situation. When the plant manager went to the purchasing manager to talk to them, the purchasing manager said that this was the direction given by the VP of finance. When the Plant Manager went to talk to the VP of finance regarding the gasket, the VP of Finance said that it was your direction to be as cost conscious as possible and going to the lowest bidder saves us a lot of money. The plant manager was horrified when he learned that he was the cause on why there was oil leak on the floor.
I can give more but let me stop on this. Latent cause exists and they must be exposed. Change can only take place if we have the guts and courage to look ourselves in the mirror and ask ourselves, what is it about the way I am that contributed to this problem and what is it about the way we are that contributed to the problem.
Therefore to answer the question, is Root Cause Failure Analysis Reactive or Proactive. Root Cause Failure Analysis is both Reactive and Proactive but Root Cause will start on a reactive scale and we can only be proactive if we can address the small things as well as expose the latencies of the problem and I think that is all I have to say about that.
Your thoughts matter to others - more than you can imagine.
More Articles published by Rolly
6 Free | Ad-Free | Full-Text Business Papers
MaintenanceWe often hear the word Root Cause Analysis, and Root Cause Failure Analysis yet, I really wonder if we really know what it truly means. In fact almost ... 50 | 4.05 K | 9 min
MaintenanceLets shed some light on what Root Cause Analysis is all about - and why most industries fail in their Root Cause initiatives. 50 | 3.94 K | 14 min
Maintenance5 Cases 50 | 2.63 K | 10 min
MaintenanceWhere do we stop our investigation in performing a Root Cause Analysis? 50 | 2.59 K | 10 min
MaintenanceWhat should be done if suitable proactive maintenance tasks cannot be found? 53 | 8.12 K | 9 min
MaintenanceUncovering the evidence is the most important factor in any Root Cause Analysis investigation, in fact without evidence, it is entirely impossible to carry on. 50 | 4.69 K | 11 min
More 'Maintenance' Articles
6 more Articles that may interest you, too...
What should be done if suitable proactive maintenance tasks cannot be found?8.12 K | 53
Uncovering the evidence is the most important factor in any Root Cause Analysis investigation, in fact without evidence, it is entirely impossible to carry on.4.69 K | 50
Where do we stop our investigation in performing a Root Cause Analysis?2.59 K | 50
We often hear the word Root Cause Analysis, and Root Cause Failure Analysis yet, I really wonder if we really know what it truly means. In fact almost every industry has their own unique Root Cause Analysis technique to follow.4.05 K | 50