Disaster Recovery @ C&D Foods

On the 15th of January 2006 a major fire ripped through the C&D Foods pet food factory in Edgeworthstown, Co. Longford, causing millions of Euro in damage. At the time, I was IT Manager for C&D Foods and that fateful Monday morning had to implement a disaster recovery plan of my own devising that I had hoped would never have to be used.

C&D Foods in Edgeworthstown, Co. Longford, Ireland, is a leading manufacturer of own-label pet foods for the UK and Ireland, and I was employed there between 2002 and 2006 as IT Manager. At the time, C&D employed over 500 people at the Longford site which was made up of the primary canned foods factory, a soft-can production unit (making flexible pouches of pet food), the company’s main administration offices, and a Research and Development unit that was studying animal nutrition as well as offering advice on quality and formulation to the production teams. During my tenure there, C&D began an ambitious expansion plan, purchasing a production site in the UK and building new warehousing and distribution facilities in Edgeworthstown.

picture-01
C&D Foods in better times.

When I joined the company, the amount of Information Technology in use was quite low, limited to a couple of servers, printers, and the bare minimum number of desktop PC’s necessary for those who absolutely needed them. Over the course of a couple of years, the IT fleet grew to encompass the factory floor through a Shop Floor Data Collection project and its associated wifi system, two remote sites (one in Ireland and one in Yorkshire, UK), numerous mobile workers and their laptops and mobile phones, and considerably more PC’s, printers, and servers. The dependence on IT was growing to the point were additional staff were needed, and I was joined in the department by a new full time Support Analyst as well as the on-going support of contract developers and system vendors.

C&D operated in a near 24×7 fashion and it was at about 11:00pm on the evening of Sunday 15th January 2006, just as the last shift of that day was winding down, that disaster struck. A fire broke out towards the rear of the main cannery building, near the stores, that quickly spread throughout the factory. The thirty or so employees working that evening were all evacuated to safety, but the fire took hold and was soon out of control. The local fire brigade, including local volunteers, responded and a major incident response swung into action. It would turn out that the voluntary nature of the fire service would have a major impact on the IT disaster recovery that would be needed once the fire was dealt with.

000071a1-314
The C&D Foods factory immediately after the fire.

I found out about the fire early the next morning when friends began to contact me after hearing about the disaster on the news. I reported for work and found a scene that was a strange mix of devastation but also normality. From the roadside, the front of the building seemed untouched with only smoke damage in spots to indicate there had been a fire at all. The view from the side and rear of the factory was a different story however, as most of the cannery was destroyed.

I was able to enter the building about mid-morning to survey the damage to the offices and specifically the server room. The offices escaped with only light smoke damage but while the server room was untouched by the fire there was a lot of smoke damage. Funnily enough, some of the servers were still running. The power had been cut to the building but the UPS systems kicked in. Some of the UPS’s had initiated a graceful shutdown when their batteries hit critically low levels, but one of the larger units had soldiered on so when I entered the room it was to be greeted by the low battery alarm of the main UPS. The systems that had kept running up to that point were shut down and an assessment of the physical state of things was taken.

Image054
The heavily smoke damaged corridor outside the server room.

Everything in the server room was heavily covered in soot as the room was on the path the smoke from the fire took as it tried to leave the building. This presented a serious and immediate problem as the residue from smoke is corrosive and can severely damage electronics. The main offices, located at the front of the building, were almost totally unscathed, with only a very light covering coupled with a bad smell, so the real IT focus was on the server room and its contents, once one important task had been handled.

Image057
The door to the production hall. This was as far as the fire got thanks to the efforts of the local Fire Brigade.

C&D Foods was the single largest employer in the Edgeworthstown area by a long shot, was easily in the top five private sector employers by size in Co. Longford, and was one of the largest private sector organisations in the four counties of the Midlands (Longford, Laois, Offaly, and Westmeath). When news of the fire broke, it was a headline item on national news that day, attracting interest from print, radio, and TV news outlets (at one point during the day someone who could speak Irish had to be found in order to give a statement to the TG4 reporters) , so there was a significant need to handle in-bound communications. E-mail was initially not a problem as all in-bound mail would be stored on the ISP’s servers until collected by the internal mail server and wasn’t really considered in terms of dealing with urgent communications, but the phone lines were down and the PBX was out of action until its status could be assessed and reliable power restored. In order to handle phone calls, I contacted our Eircom rep and requested a temporary voicemail message be applied to the main line number detailing (in broad strokes) what was happening and who to contact for further details, then all of the DDI numbers were then diverted to the main line number so that any and all calls would be captured until it was time to bring the PBX back on-line or to divert to a different line altogether. With this simple act complete it was time to start the disaster recovery proper.

Image059
The scene outside my office window, Monday 16th January 2006 – note the satellite truck on the road getting ready to broadcast.

Several years prior to the disaster at C&D, I had been approached by Compaq on the subject of Business Continuity as that company was running an initiative on the subject at the time. One of the things I took away from this initiative was the concept of a graded disaster recovery plan. The core idea of a graded DR plan is that not all disasters are the same, depending on the impact of the events in question. In some businesses I worked at after C&D, an extended power cut was enough to activate a full DR, whereas for other organisations only the entire loss of the premises would be enough to put a DR plan into full swing.

C&D had a graded plan for the recovery of IT based on the scale of loss or nature of damage sustained by the core equipment. This meant that for something like smoke damage the response would be different then the response to fire damage. Back in 2006, despite C&D expanding into new facilities away from HQ, there were no co-location or backup sites for IT systems so any recovery would depend either on backup media which was stored off-site, or on being able to salvage equipment and put it back to work. In the short-term, for smoke damage, the plan was to attempt a salvage with a view to getting and staying operational long enough to replace equipment in an orderly fashion.

Image060
Like any major incident, the cause of C&D Foods fire was thoroughly investigated and for the first couple of days not every part of the site was accessible to staff.

A real-world disaster recovery plan should have input from the organisations insurers as they will have an interest in the DR activities and their impact on getting the organisation back operational, and of course, they would like to see equipment restored as opposed to being scrapped and the cost of replacement falling to them. The C&D insurers had put me in touch with a company that specialised in recovering smoke and water damaged electronics like computer systems and this was how the fantastically named ISS Damage Control came on the scene (though since then, ISS Damage Control go by the name ISS Restoration, I suppose in an effort to make them sound less like they’re involved in putting out fires on space stations).

During the course of the week, after the initial assessment on Monday 16th January, the IT systems were kept off-line in an effort to reduce the possibility of damage occurring. It was necessary to boot up some of the servers for short intervals during the week to facilitate urgent business requirements – during this week most people were asking to access their email but the pressing concern in a DR situation like this one is financial management and planning as you need to determine what will happen with the business and what the financial position is in order to make immediate decisions – the occasional system start-ups were to facilitate the finance team in these tasks. It was sadly apparent from early on that production would not be up and running again in Edgeworthstown for some time, if at all. On the afternoon of Friday 20th January a meeting was held in the Longford Arms hotel (the first of these meetings but unfortunately not the last) that informed staff of the extent of the damage to the factory and the impact on the workforce, with hundreds of people immediately out of work.

Image065
The staff meeting in the Longford Arms on Friday 20th January 2006, where everyone learned the extent of the damage.

There was a glimmer of hope for some of the employees. Parts of the factory were not affected by the fire and a plan emerged to put those areas back into production as soon as possible. One such area was the Soft-Can facility where the pouches of pet food were manufactured. This section employed about one hundred people at the time so saving it was an important aspect of keeping the whole facility in a viable state. The Soft-Can production hall would also provide the perfect location for an important, if somewhat odd, part of the IT Disaster Recovery. Towards the end of the week the team from ISS Damage Control arrived and began to prepare for their unusual work by setting up their equipment in the Soft-Can hall. ISS had come on-site to wash the servers.

Image056
The view from the Server Room door shows just how much soot had been dumped into the area by the fire.

Because soot and other smoke residues are corrosive the only effective way to deal with the stuff once it lands on things is to wash it off, even when it has landed on servers or any other electronics. The plan for dealing with the smoke damaged contents of the C&D server room was to take all the IT equipment apart, down to the smallest safely divisible components and to then wash each of them in a series of chemical baths. Once washed, the boards, drives, DIMMS, and so on were placed in two ovens where they were subjected to slow drying and pressure to remove all the moisture left from the washing. Once clean and dry, the servers and networking gear were rebuilt, drives re-installed and BIOS settings reconfigured.

Image055
Chaos in the Server Room as the recovery gets under way. Note the gaps in the racks where servers had been taken away for washing.

On Monday 23rd January 2006, all of the original servers were back up and running and 100% of the data was in place. The business was once again able to process transactions, print invoices, send email, and perform all the other tasks, big and small, that make a business tick, though of course the most important task – production – was at a standstill. It would be some weeks later before a skeleton crew returned to the softcan section to resume production, and it would be years before the main cannery would be completely rebuilt.

The cause of the fire was traced back to a power washing unit that had overheated during the clean down process towards the end of the shift on Sunday 15th January 2006. No one was hurt in the fire, though the economic aftermath was felt in Longford for years to come and many of the staff employed at the time of the fire moved on from C&D as a result. This outcome, though terrible, could have been a lot worse if not for the extraordinary efforts of the Fire Brigade who worked so hard to save as much of the factory as possible.

In other smaller ways lots of people put in effort well above and beyond the norm to get C&D back up and running as soon as possible. Actual Disaster Recovery situations are thankfully rare, with many in IT going their entire careers without a major incident. I’m glad that the one I’ve had to deal with went well. I hope I don’t encounter another, but I’m ready if I do.

Image062
Me in the Server Room during the Disaster Recover at C&D Foods.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s