RPA that survives change: resilience and exception queues

A shared-services team that processes high volumes of routine requests for several business units came to us after a first attempt at automation had quietly failed. Someone had built a bot for a repetitive, rules-based task — copying fields between two systems and applying a few eligibility checks. It worked in the demo. Then a form changed, the bot kept going against the old layout, and for the better part of a week it wrote plausible-looking nonsense into a live system before anyone noticed. The clean-up cost more than the manual work it had replaced.

The demo is the easy ten percent. The exceptions are the job.

The challenges we had to solve

The underlying process was not standardised — three analysts did the same task three slightly different ways, so there was no single rule set to automate.
The original bot had no concept of an exception; anything it did not expect, it guessed at, silently.
There was no audit trail, so when something went wrong nobody could see what the bot had touched or why.
The team had lost trust in automation and needed a reason to believe a second attempt would be different.

How we approached it

We did not write a line of automation for the first few weeks. We sat with the analysts, documented the task as it was actually done, and agreed one standard way to do it — including what counts as a normal case and what does not. Automating a messy process only gives you a faster mess, so the standardising was the real work. Only once the rules were written down and agreed did we start building, and we built for the unhappy path first: every input the bot could not handle with confidence goes to an exception queue for a person, rather than being forced through.

Every action the bot takes is logged — what it read, what it wrote, which rule fired — so the team can reconstruct any item after the fact. We set it to fail loudly and stop, not to improvise, the moment a screen looks different from what it expects. We measured against a target the team set themselves: the share of items that flow through untouched, watched week by week so a rising exception rate is an early warning that something upstream has changed.

Where it stands

The routine majority now passes through without a person touching it, and the analysts spend their time on the exceptions — the cases that genuinely need judgement. When a form changed again recently, the bot stopped and flagged it instead of carrying on; the fix took an afternoon, not a week of clean-up. The team’s measure of success is no longer how much the bot does, but how quietly it does it and how fast they hear about it when something breaks.

Making a back-office automation survive its second month

The challenges we had to solve

How we approached it

Where it stands

Talk to us about your project.

Cookies on this site