How to conduct an effective electronics failure analysis investigation

Last night I went out with friends and bumped into someone I haven’t seen since high school. It was lovey to catch up, but two things struck me; firstly, there’s nothing like meeting a friend you last saw at 16 to make you realise that time is passing, fast! And, secondly, I squirm awkwardly, mumble something about problem solving then change the subject when someone asks me what I do for a living.

Since there is nothing I can do about the first (other than explore scientifically proven benefits of collagen supplements and investigating the safety of botox – DM me if you know), I decided to sit down to try to clearly articulate what I do. Explain it like I’m 5 style!  

With so many niche professions out there, I’m sure I can’t be alone in this dilemma. I ran over and over different ways to explain my job and came up with lots of “that’ll do” statements but nothing that I felt really hit the mark.

I solve problems for people. Too vague.

I’m a forensic scientist for the electronics industry. Too abstract.

I use materials science to find the root-cause of electronics failures. Accurate, if a tad dull. But this inevitably invites the “oh, I once dropped my iPhone down the toilet” conversation.

So, I did what any scientist worth their salt would do in this situation – I totally over-thought it and before I knew it, I’d written my first blog! They say everything is a process so maybe taking 2000 words to explain to like-minded engineers will lead me to a simple one sentence description for a 5-year-old. 😊

Don’t worry, this is the end of my self-indulgence. This blog explains how to carry out effective electronics failure investigation.

What do we mean by electronics failure analysis?

Failure analysis (FA) refers to techniques used to understand how and why electronics fail or do not behave as expected. However, FA means many different things depending on your background and experience. For some, the term failure analysis triggers thoughts of failure mode and effects analysis (FMEA). FMEA is important but more closely aligned with design and product development due diligence rather than problem solving. Carrying out FMEA is essential but unlikely to get you to a manufacturing line-stop solution or a decision as to whether an aircraft is safe to fly. For others, failure analysis is synonymous with electrical testing. Whilst electrical test can narrow down the problem area and provide failure symptoms, changes in electrical response won’t tell you why the problem happened in the first place. To solve the problem, we need to know why it occurred. In fact, we need to keep asking ‘why’ until there is no where else to go. The 5 why’s is a common approach used in many strategic plans and, it works pretty well here too.    

When talking about FA, there are terms that people use interchangeably to mean the same thing, even though the true definitions are distinct. Personally, I’m not too fussed about logical semantics. But, for the record, we should clarify:

  • Failure mode: The symptom of failure. “It won’t power up” or “it stops working 5 minutes after I switch it on”

  • Failure mechanism: The physical condition leading to failure. What’ gone wrong or how has it broken? For example, cracks, corrosion, delamination etc.  

Failure mode and mechanism give us the ‘what’ and ‘how’. To find an appropriate solution, we need the ‘why’. And that’s where failure analysis investigations come in. Before we get into detail, here are my top tips for carrying out an effective electronics failure investigation:

  1. Find out what the team know about the failure - “Interview” them and collect systematic information.

  2. Preserve the evidence - don’t do anything that will change the or mask the problem.

  3. Try to locate the problem area - test electrically and carry out surface inspection.

  4. Think it through - plan the investigative approach using the above information.

  5. Test what you can without changing the sample - conduct non-destructive testing to find the target(s) for root-cause analysis.

  6. Get to hidden information - carry out destructive sample preparation, recording condition at every step.

  7. Find the diagnostic clues in the detail - conduct micro-section analysis, materials testing and other analysis techniques.

  8. Bring it all together - interpret the information, be led by the data and consider context as well as technical detail.

  9. If you don’t see anything or something just doesn’t fit, look again. The answer is always there to find!

The “Interview”

Having investigated countless failures over my career to-date, I fully recognise the stress people are under when dealing with failures whether they occur during manufacturing, test or in-field. It is important to understand that time, cost and risk to reputation mean that best practise is not always followed. It’s also worth remembering that it is human nature to protect ourselves and our teams. An investigators first job is to listen to the people who are in the midst of the crisis. The aim is to find out what happened, when. The history of the affected product is also important. For example:

  • Is this a newly developed product or has it been in production for years?

  • What is the normal failure rate?

  • How are products manufactured and tested?

  • Have there been any supply chain, materials, process, test or use environment changes?

  • Are failures associated to a particular batch and are sub-assembly, component and PCB batches tracible?

  • Is this a hard failure or is it recoverable?

It is usually best to run over that a few times and with different people where possible. Listen and repeat what you’ve heard. Ensure understanding and look for uncertainty, bias, frustration or difference in account/opinion. We are not trying to catch anyone out, but we are trying to establish what information we can rely on and what may be based on assumption or premonition. After all, you know what they say about assumptions…

Next is to find out exactly what samples are available and what has happened to them. If possible, and appropriate to the specific case, quarantine samples from the same batch that have not yet failed. These samples may show the failure mechanism at an earlier-stage which can support root-cause diagnosis. Find samples from previous batches that behaved as expected. These samples will be used as reference non-failed samples. If non-failed samples are not available, unused samples can be helpful as a reference. Ideally (but very rarely) the failed sample has been left well alone. If you are lucky enough to be in this position, the first thing to do is record the condition, just as it is. If the product fails after further assembly or in use, take photos in-situ from all angles and use different lighting if you can. There can never be too many photos. This is your record of the “crime scene”.

Preserving the Evidence

There are lots of similarities between a failure analysis investigation and a forensic investigation. You should apply the same effort to protect the evidence as a forensic scientist does when gathering evidence from a crime scene. To get to the root-cause, you should approach the analysis intelligently choosing the right techniques in the right order to maximise the data obtained from the samples available. The same is true when a pathologist is tasked with identifying cause of death. Samples are limited and every test conducted must liberate useful information.

Forensic analysis begins with the preservation of evidence. To ensure the integrity of the investigation, it is essential to safeguard the failed component and surrounding materials from contamination or damage. Proper handling techniques, such as wearing gloves and using static-safe packaging, help prevent the introduction of foreign particles and preserve the evidence for thorough examination. Special care should be taken to document each step of the investigation, from sample collection to analysis and interpretation of results. "Chain of custody" protocols ensure that evidence remains secure and uncontaminated, providing a clear audit trail of its handling and storage.

Visual Inspection and Electrical Test

Before any physical contact is made to test the failed circuit board, the board should be inspected visually, and condition recorded so that any change made during electrical test is known and can be kept in mind at later stages. Failed and non-failed samples should be subject to a suite of comparative electrical tests to pinpoint abnormalities in its performance. Voltage measurements, continuity tests, and impedance analysis are among the techniques available to identify the failed component or circuitry.

When complete, the failed sample(s) should be fully screened and inspected in detail using light microscopy in the suspected region of interest. Again, it is helpful to compare to non-failed or new, unused samples.

Plan Meticulously

As Benjamin Franklin said, “By failing to prepare, you are preparing to fail”.

At this point of the investigation, it is rare that an experienced investigator would not have an idea of what may have gone wrong. However, it is essential that all the information and data gathered is review, considered and a plan created to make sure that all likely eventualities are covered. For example, whist a BGA solder joint failure may be suspected, before grinding through the PCB vias that connect to the suspect BGA ball, the PCB trace and vias condition should first be confirmed. The investigation should be ordered in a way that liberates most data making sure that samples are excised in a way the preserves condition for the next analysis stage.  

Analysis Techniques

With electrical anomalies detected, x-ray inspection techniques come into play. X-ray imaging, computed tomography (CT), and/or X-ray Microscopy (XRM) allow investigators to peer beneath the surface of the electronic assembly and visualise internal structures non-destructively. By selecting the right approach for the size of the sample, area of interest and materials, it may be possible to identify faults such as solder bridges, open circuits, or component misalignments. However, sometimes, the failure is just too small to resolve with clarity.

To delve deeper into the root cause of failure, destructive cross-section analysis should be employed with a clear and precise plan in place to progressively intersect potential locations of failure. This meticulous process involves carefully excising the region of interest containing the failed component or circuitry and carefully encapsulating it in resin. With the sample physically supported, it can be cut and subject to grinding and polishing using progressively finer abrasive media.

Under light microscopy, the cross-section of the failed component is magnified to reveal its internal structure and any defects present. Investigators should examine the sample using a variety of lighting conditions, looking for evidence of manufacturing anomalies, damage or other abnormalities that may have contributed to the failure.

In some cases, conventional micro-sectioning and light microscopy is sufficient, however, to reveal microstructural information from materials it is often necessary to further prepare micro-sections using "non-contact" material removal techniques. These include broad ion beam (BIB) approaches or, in the case of semiconductor failures, focussed ion beam (FIB) to create a trench cut in a precise position or a thin section for transmission electron microscopy (TEM).  

With microstructure revealed, Scanning Electron Microscopy (SEM) takes the analysis a step further by providing high-resolution and high magnification images of the area of interest. Investigators should explore the microstructure in detail to determine how failures have occurred. For example, looking closely at the location of a "crack" in a solder joint may show that the crack originates as a series of voids that coalesce. Energy Dispersive X-Ray (EDX) analysis should be used to determine the elemental composition of the sample and any foreign contaminants present. By analysing the X-ray emissions generated during the SEM analysis, investigators can identify the elemental composition of the materials they are visualising. Going back to our example, EDX would be used to identify which material and which phase of material the voids are present in.

With all the evidence gathered, it is over to the investigator to use their expertise in forensic analysis and electronics failure modes to get the root-cause. To prove the root-cause it is often necessary to have reference non-failed samples or early-stage failures to compare to catastrophic failures. Using our example, the root-cause is the reason behind the formation of voids (more on this in a later blog) rather than any stress or strain causing the compromised solder joint to subsequently crack. The solution is to prevent the voids forming in the first place.

Summary

Failure analysis focuses on preserving evidence and systematically generating valuable data. Electronics failure investigations are most successful when experts use their understanding of electronics systems and materials science to interpret analytical results and gather information from "witnesses". This broader context is essential for identifying the root cause of the failure. Once the cause is understood, the solution becomes apparent.

And there it is, easy! That is exactly what I do. When electronics fail, I take them apart, find clues trapped within the materials and tell people how to make them better next time. And rather more simply for those polite pub conversations where the music is too loud for anyone to really care – CSI for nerds.

Most electronics manufacturers and users do not have the requirement to carry out forensic investigations regularly. At Forensic Eyes, it’s our day job. If you would like support through services, consultancy or training, get in touch. To book a free consultation with me, click here: https://calendly.com/suzanne-be0a/meeting-with-suzanne-costello

For more information, contact Suzanne Costello on enquiries@forensic-eyes.com

Follow Forensic Eyes on Linkedin and sign up to receive a copy of future blogs linkedin.com/company/forensic-eyes-ltd

Previous
Previous

Hermeticity of electronics packages: How do you prove an ultra-low leak rate?