Biggest Tech Outages 2024: Change Management Is to Blame

High-profile technology outages have rattled the UK’s retail landscape. In less than a week, Greggs became the fourth major retailer to report high-profile IT outages across its UK stores.

However, many of these incidents, affecting giants such as McDonald’s, Sainsbury’s, and Argos, were not the result of cybersecurity breaches or sinister conspiracies around state-sponsored cyber attacks.

Many of these recent tech outages resulted from the issues linked to software updates that have gone wrong.

A convergence of tech failures across some of the UK’s most frequented retail outlets is a stark reminder of the vulnerabilities inherent in today’s digital infrastructure. However, it also highlights the critical need for robust IT change management practices.

Key Takeaways

March 15: McDonald’s announces global payment processing problems due to a configuration change.
March 16: Sainsbury’s and Argos cannot take cashless payments because of a software update.
Retailers increasingly rely on complex payment processing with multiple intermediaries to verify a single transaction.
The rise in technical debt is caused by stacking new solutions on top of legacy technology and poor change management collides.

McDonald’s Global Shutdown: A ‘Configuration Change’ Catastrophe

McDonald’s recent global technology outage spanned continents from the UK to Asia Pacific, including countries like Australia, New Zealand, and Japan. The incident began at midnight CDT on Friday, March 15, and quickly rendered many McDonald’s outlets unable to process orders through in-store kiosks, counters, or even the mobile app.

McDonald's: Global outage was caused by "configuration change" – @serghei https://t.co/72ys2zHSYw https://t.co/72ys2zHSYw

— BleepingComputer (@BleepinComputer) March 15, 2024

The event was traced back to a “configuration change” by a third-party provider, underlining retail IT networks’ intricate dependencies and potential vulnerabilities.

No Sale: The Software Update That Cost Sainsbury’s and Argos Dearly

Over a tumultuous weekend, Sainsbury’s and its subsidiary Argos also faced significant operational disruptions due to a failed IT update, causing widespread inconvenience and estimated losses of up to £9 million in orders, according to the Telegraph.

Shoppers at Sainsbury’s first encountered issues with contactless payments, and many also saw their online grocery deliveries canceled. Sainsbury’s own Argos was also impacted, with customers reporting difficulties in ordering and collecting items.

Due to an error with an overnight software update, we are experiencing issues with contactless payments and will not be able to deliver the vast majority of today's Groceries Online orders. Our stores are open as usual, accepting chip and pin and cash payments.

— Sainsbury's (@sainsburys) March 16, 2024

At the heart of these disruptions was an “error with an overnight software update,” a stark reminder of the critical importance of robust IT change management.

Although it was a day of chaos for shoppers, Sainsbury’s eventually restored service functionality, including resuming contactless payments and online grocery ordering.

However, questions still need to be answered about why the retail giant approved a change control request on a Friday evening before heading into a busy Saturday with limited support.

Tesco and Greggs Encounter Technical Difficulties

In a week marked by technological turbulence, Tesco and Greggs were among the UK’s high-profile retail brands that faced payment processing hurdles, spotlighting the fragile nature of modern retail operations.

These incidents disrupted customer transactions and, in some instances, forced temporary store closures.

However, some quick-thinking stores have found a way to avoid payment processing errors by directing customers to Uber Eats.

@O2 I got a Greggs voucher but my local store is closed pic.twitter.com/mtS7okzCXy

— Harwinder (@hkambo1) March 20, 2024

The IT troubles at Greggs and Tesco were vaguely attributed to “technical issues” without further elaboration on their causes. Greggs, for instance, acknowledged challenges with payment systems across several outlets and rectified the situation within a few hours.

Despite the quick resolution, the lack of transparency regarding the root cause of these outages leaves a cloud of speculation.

Neither Tesco nor Greggs framed their difficulties as cybersecurity breaches. Yet, more detailed explanations are needed to raise questions about the vulnerability of retail payment infrastructures to technical glitches and the broader implications for consumer confidence and operational reliability.

How Payment Processing Complexity Magnifies Update Risks

The intricate web of technology underpinning payment processing highlights the complexity of modern retail transactions. With each credit card swipe, a chain reaction involving many intermediaries unfolds, bridging the gap between consumer action and merchant compensation.

This multi-layered system, essential for handling card and other cashless payments, places merchants in a position where reliance on multiple third-party services is inescapable.

Stacking new technological solutions atop outdated legacy systems, coupled with less-than-ideal change management practices, has precipitated a surge in technical debt, presenting a formidable challenge to maintaining seamless operations.

Every software update introduces a maze of added complexity. Any change can ripple through the labyrinth of intermediaries, challenging the robustness of change management practices.

This underscores the need for meticulous coordination among all parties involved in every technical update or change to the system.

These recent episodes underscore the fragility of growing dependence on digital infrastructure.

However, this emerging narrative delivers a message of caution and concern as businesses grapple with the dual challenges of keeping pace with technological advancements while ensuring system stability and reliability.

The Crucial Role of Change Management

IT change management is designed to offer a structured approach to managing system updates and mitigating risks. Many recent incidents are textbook examples of why rigorous testing protocols, release planning, and rollback plans are essential in preventing such widespread disruptions.

The complexity and scale of the technology involved mean that thorough testing is not just advisable but essential. With so many interconnected parts via application programming interfaces (APIs), even minor errors can lead to unpredictable and far-reaching consequences.

One of the critical lapses in the recent high-profile cases was the timing of updates, with some scheduled on peak trading days, thereby magnifying the impact of any resultant issues.

This decision-making flaw highlights a broader problem within IT change management practices –- the need for strategic planning and considering the timing and potential impacts on operations and customers.

Moving Forward: Lessons & Strategies

The recent outages serve as a warning for retailers to reevaluate their approach to IT change management.

Strategies such as rolling out updates gradually can mitigate risks by keeping potential downtime localized.
Adopting rigorous testing protocols and developing comprehensive rollback plans are crucial to ensuring business continuity and safeguarding customer trust.
The path forward involves addressing legacy issues within operating and payment systems and embracing a culture of continuous improvement and risk management.

As retailers navigate the complexities of the digital age, the lessons learned from these outages must inform future practices.

An IT change management process can mean the difference between ensuring services remain resilient, reliable, and responsive to their customer’s needs and hitting the headlines for all the wrong reasons.

Navigating the Waters of IT Change Management

Ultimately, change management offers a much-needed disciplined approach to introducing new technologies or updating existing systems. This process ensures that changes are implemented smoothly, efficiently, and with minimal service disruption.

80% of unplanned outages are due to ill-planned changes by administrators (“operations staff”) or developers, and 70% of all change projects fail to achieve their goals.

The recent outages are a stark reminder of what can go wrong when IT changes are not carefully managed.

From the moment a change is proposed to its implementation, a series of strategic steps must be followed to safeguard against potential fallout.

A checklist detailing the steps for IT change management plan

This begins with a request for change (RFC) outlining the plan for updates or modifications. Such a plan should detail the purpose of the change, its implementation schedule, and the anticipated impacts on operations.

The Critical Role of Assessment & Planning

Following the RFC, a change advisory board (CAB) assesses the potential risks and implications of the proposed changes. This phase is crucial for identifying potential emerging issues, allowing for preventive measures to be implemented before they escalate into full-blown crises.

At this part of the process, the value of a proposed change is weighed against its risks to make an informed decision on whether to proceed.

There also needs to be assurance that, should anything go wrong, the technical team can demonstrate that they are confident in rolling everything back to a working state.

The Importance of Meticulous Implementation & Review

With approval secured, the focus shifts to implementing the change. For retailers, particularly those with operations spanning multiple locations, this phase must be meticulously planned during off-peak hours to minimize disruption.

However, the job isn’t completed once the change is implemented.

Post-implementation monitoring is critical to ensure the shift achieves its intended outcomes without adverse side effects. This phase allows for the swift identification and rectification of any unmet anticipated issues, reinforcing the system’s resilience.

Preventing Future Outages and Ensuring Business Continuity

The lessons from recent tech outages are clear. Rigorous IT change management is not optional but necessary for retailers. Retailers can significantly mitigate the risks associated with software or system changes by thoroughly planning and testing updates in a controlled environment and keeping all stakeholders informed.

The capacity to quickly revert changes or deploy fixes is indispensable for maintaining business continuity and ensuring customer service remains unaffected.

The Bottom Line

The role of IT change management in facilitating smooth transitions and averting operational upheavals cannot be overstated. This strategic priority demands that businesses embrace meticulous planning and execution of IT changes to prevent disruptive tech outages, bolster resilience, and enrich the customer journey.

The recent IT meltdowns in retail serve as a timely reminder of the delicate balance retailers must maintain between modernizing outdated systems and mitigating risks.

It also raises pertinent questions about how retailers can refine their IT change management strategies to anticipate the future’s demands, learning from past challenges to develop more robust, adaptable, and resilient IT frameworks.