July 12, 2014
It had been less than 36 hours since migrating a core ERP (Enterprise Resource Planning) system. This ERP was responsible for keeping a $5B medical device business making medical devices, but our capacity to do business making devices had a problem – we had just re-attached the core, initial tests came back, and the news was bad.
About 35 technical staff were running the migration, with 30 user-testers executing hundreds of test scenarios, all under the predictable pressure of our entire technology leadership team, which lurked in the shadows like so many pizza-eating specters. The news? Every single test was failing. This meant the business was panicking, and the technology leadership was freaking out.
We had a 72-hour window to complete the migration. Scheduled to follow that 72-hour period was a period of losing $1,000,000 an hour until the downtime issues were cleaned up. The migration team was needed to find its issues, root-cause those issues, and then team management needed to task the staff to solve those issues. Meanwhile, the backlog of issues grew as the army of user-testers ran their test scripts. How does recovery happen?
1. Don’t Panic
The basic human reaction to overwhelming information is usually not helpful. Don’t panic, don’t wilt, don’t freeze up. This is time for high-order thinking and practiced responses; avoid behaviors that would be described as “childish” or “reactive” or “freaking out” in the post-action report – you’re a professional, for goodness sake – act like one.
Take a deep breath and approach issues as they come, logically, analytically, structurally. This unfortunate outcome (it’s just a failed test or two) means it’s time to push the button on your contingency plans – you did implement a program for managing migration issues before you migrated, correct? Whether you have a plan or are winging it, the key is to keep a clear, analytical head and avoid doing your worst WARNING! WARNING! impression of the robot from “Lost In Space”.
The best way to not panic is to:
2. Have a Plan
The key to a strong plan – Hell, the key to any plan – is to have a plan. Think ahead, consider the worst-case scenarios, prepare how you will manage them, wargame the outcomes. For example, did you create a social-biz worksite, forum or joint IM and email where issues can be collected? A ticket-tracking tool like Mantis, or a list on a white board? Who’s in charge of the list? Where’s it being kept? The important part is to plan in advance to handle Mr. Murphy, because Mr. Murphy always has a plan for you.
3. Be Patient
Once the “problems” in your process begin identifying themselves, and the plan you’ve put in place is being executed upon (or your fly-by-night planless plan is winging it with the best of them), you have to give recovery and improvement processes time to develop.
The biggest challenge often faced by teams when working through a daunting list of failures is resisting the urge to jump from issue to issue without resolving any of them. This approach to troubleshooting is destined to turn out poorly, such as with time lost in the thrash generated by repeatedly redirecting staffers, or overreacting to new information. Quantifying and solving failures means measured analysis, consistent action and sticking to the plan. The best course of action often results from a “divide and conquer” team approach – segments of the implementation group focused on resolving one issue at a time.
How do you eat an elephant? One bite at a time. You resolve daunting solution delivery problems the same way.
What did we do?
We had a plan, had that plan in every team members’ head (and on the wall, just in case they were stuffing their face instead of their head), we didn’t panic, we avoided becoming overwhelmed (with our plan’s help), and we acted with clear direction and mind. We quickly stack-ranked the open issues, and assigned each of our technical teams one task with the directive to not look at another issue or ask for another assignment until they were confident the issue assigned was resolved. Remember to have fallback plans on top of your action and reaction plans, but also remember that those fallback plans are not pretty – avoid them at all costs.
And remember, sleep is overrated.
How do you bring order to chaos?