What Mean Time Between Failures Really Means
Have you seen this figure on computer and electronics packaging? Chances are, it doesn't mean what you think it does.
Many tech-heads are familiar with the pretty unreadable acronym MTBF. It stands for mean time between failures, and quite a few people (and companies) throw this term around when talking about the durability and reliability of products. But out in many parts of the consumer audience, there’s a serious disconnect between what MTBF is believed to mean and what it actually tells us about a product's reliability.
You may see MTBF numbers stamped on the packaging of IT products like hard drives or other hardware, where longevity is an important part of purchasing decisions for shoppers. Sometimes, online stores even let customers browse products by MTBF. But generally, there’s no big, upfront explanation of how the company came by this figure. Often, MTBF is expressed in hours. That seems straightforward, but in reality, that doesn't tell the whole story.
Are you totally and utterly confused? So were we. Here we dig into MTBF and what it means for consumers.
What Does MTBF Mean?
Mean time between failures is often expressed in terms of hours. It’s when shoppers who haven’t read up on this idea do the math that they can come up with some incorrect ideas about a particular product's durability. Let’s say, for example, that you see a product rated for 43,000 hours. If you simply string out that number of hours into continuous operational time, you’ll come up with just under five years. That leads some to think that the device has been tested and researchers have found it is likely to run for five years before breaking.
Not so. The reality is that in most cases, testers didn’t run a single unit for anywhere close to 43,000 hours. Instead, it’s more likely that the test involved running a larger number of units, say, 1,000 of them, for 43 hours (though tests may typically run for a longer time period than this). The manufacturer’s researchers take the number of failures and use that to calculate the MTBF. If just one drive fails during this 43-hour test run, the MTBF number becomes 43,000.
To be fair, some manufacturers do use what may be called stress testing to try to replicate the wear of a longer time frame by subjecting devices to higher temperatures and other stresses, but again, this doesn’t always show up in the specs shown to consumers. (To read more about MTBF in the IT world, check out 5 Warning Signs of a Critical Equipment Failure.)
MTBF and Consumer Electronics
One reason why individual customers tend to assume a more literal meaning to MTBF is related to its meaning in various physical industries, where the term does often refer to the average error-free run time for a single system. That’s why it’s so crucial for electronics makers to make clear that when they advertise MTBF, they are not saying that a device will run error-free for that length of time.
Consumer advocates often have specific criticisms of the use of MTBF in consumer electronics. Zac Carman is the CEO of Consumer Affairs, a company that provides a wide range of product and corporate reviews, consumer complaints, and more. In comments to Techopedia, Carman called MTBF and related metrics "scientifically sound" but also "esoteric," even, in his view, to many of those who are relatively tech-savvy.
"The root of the problem with evaluating a purchase of electronics devices is that actual failure rate data is opaque to consumers," said Carman. "It would be great if there was transparency at the corporate level where businesses talked about their products' failure rates and the subcomponent failure rates openly." (Sometimes equipment that fails can still be fixed. For more, see 5 Tips for Fixing a Hard Drive Problem.)
Carman added that part of what Consumer Affairs does is to offer shoppers more of this concrete data through online product reviews.
What Should Manufacturers and Product Vendors Do?
As an example of how electronics makers could be more transparent about their products, Carman suggests giving consumers access to total annual failure rates for a product. Other consumer advocates have also been calling for these kinds of annual failure rates. For example, according to Robin Harris in a post on StorageMojo, different metrics like annual failure rate (AFR) and annual return rate (ARR) can be more accurate descriptions of product failure probabilities. He also suggests that many companies are already moving toward these kinds of measurements.
Breaking down test result numbers into different formats can help, but in the end, it will be useful to many electronics shoppers to simply associate MTBF with the idea of a large-scale testing environment. To be sure, part of the challenge with these sorts of terms is in sorting out the alphabet soup of acronyms that’s so endemic to IT.
MTBF: A Problem of Perception
But there’s also another element to changing people's minds about MTBF in the IT world. In that respect, it's a challenge that’s not really technical, but requires a more investigative approach that's not all that different from what modern journalists have brought, for example, to the dark world of derivatives trading. Despite common perceptions about MTBF, figuring out what’s behind a product maker’s numbers doesn’t require an understanding of high-tech engineering; it just requires an answer to a very simple question: Exactly how did you come up with this?