Technology is changing and so should the civil service

The Prime Minister’s call for the “complete rewiring of the British state” has put the onus on the civil service to match the demands placed upon it by rapid technological advances – most notably the rise of generative artificial intelligence (AI).

The question is not if or when AI will change how policy is made, but how policy makers can use it to improve outcomes for citizens. The impact will be extensive but not total. There are some parts of the policy making process where, for now, the role of the policy maker is relatively unaffected – like officials using their judgement to navigate the competing interests and idiosyncrasies of Whitehall to get things done.

But in other areas, the effect will be more apparent and immediate. Tools like Redbox can dramatically reduce the time it takes for a minister to learn about a new topic – as well as commissioning an official, they can ask a large language model (LLM). This challenges the traditional ways officials manage the flow of information into ministers.

LLMs will also change the intellectual process by which policy is constructed. In particular, they are increasingly useful – and so increasingly being used – to synthesise existing evidence and suggest a policy intervention to achieve a goal.

Policy work across Whitehall is already being usefully augmented by LLMs, the most common form of generative AI. The tools available include:

Redbox, which can summarise the policy recommendations in submissions and other policy documents and has more than 1,000 users across the Cabinet Office and Department for Science, Innovation and Technology.

Consult, which the government says summarises and groups responses to public consultations a thousand times faster than human analysts. Similar tools are used by governments abroad, for example in Singapore.

A live demonstration of Redbox at the 2024 civil service Policy Festival showed it analysing a document outlining problems with the operation of the National Grid and summarising ideas from an Ofgem report on how to improve it.

LLMs have limits

While LLMs are advancing quickly and some of their current shortcomings might only be temporary, there remain limits to what they can do.

They can synthesise a wide range of sophisticated information, but their subsequent output can be wrong, occasionally wildly so – known as hallucination. LLM outputs might also contain biases for which officials need to correct, including unfair assumptions about certain demographic groups.

Because LLMs are trained on available written information, their outputs can lack the nuance and context human experience can provide. Designing new policy to increase, say, the efficiency with which hospitals are run requires possessing advanced knowledge about healthcare policy, of the sort LLMs are increasingly capable of summarising.

But it also requires insider insight into the way hospitals actually work – vital context like what parts of the system are currently being gamed and how, and an understanding of how doctors, nurses and administrative staff will respond to any changes.

LLMs also tend to provide “standard” answers, struggling to capture information at the cutting edge of a field and provide novel ideas. Unless stretched by the user, they are unlikely to suggest more radical answers and this has consequences, particularly in fast-moving areas of policy. Ironically, AI policy is one such area.

Finally, over-credulously incorporating LLM outputs into the policy making process can be dangerous. Evidence, whether scientific, social or other, rarely points in one direction and an LLM summarising evidence might implicitly elevate some political principles over others. If done badly, a policy maker incorporating that output into advice to a minister risks building assumptions into their recommendations which run contrary to that minister’s political views.

Policy makers’ role will change

These are all good reasons for caution. But the potential benefits of using LLMs are large. In an AI-augmented policy making process, the policy maker’s key role will be to introduce the knowledge that an LLM cannot.

Policy makers’ added value will likely manifest in two main ways. The first is in using their expertise to edit and shape LLM “first drafts” – including checking for and correcting hallucinations and untoward biases. This is not that dissimilar to what the best policy makers currently do – humans, too, get things wrong or expose biases through their work.

The second is by layering policy makers’ ideas on top of LLM outputs, sometimes being prepared to push them in a more radical direction. This could involve an interactive process, in which an LLM is asked to provide feedback on ideas produced by a policy maker. The time freed up by using LLMs to perform traditionally time-intensive tasks could give policy makers the opportunity to gather and deploy new types of information which can help craft better policy.

Particularly important will be the kind of hyper-specific or real-time insider insights which LLMs struggle to capture, which could be acquired in new and creative ways – spending time immersed on the frontline, building a professional network which can give real-time reactions to new developments, or something different entirely.

Building skills

However, integrating LLMs into government might make it harder for policy makers to acquire important skills. If domain expertise and insider insights are the things for which policy makers are increasingly valued, they must possess the commensurate skills.

But this presents something of a paradox – LLM adoption might not only make domain expertise even more important to possess, but also harder to acquire. It is precisely the activities that LLMs are so efficient at performing – gathering and synthesising existing evidence, and using it as the basis for policy solutions – that policy makers have tended to use to acquire their first building blocks of expertise.

This also has consequences for policy makers’ ability to gather insider insights. It is all very well freeing up time for policy makers to collect information in new ways, but if they do not have a baseline level of expertise they will find it hard to know where to look for it and how to interpret it.

This leaves the civil service with two options. The first is to preserve some basic tasks for more junior officials so they can build the domain expertise needed to intelligently use LLMs.

The second is to reinvent the way policy makers acquire expertise, reducing reliance on the now AI-augmented traditional methods. For example, the type of official who is currently a junior policy maker could instead be deployed to the frontline, giving them personal experience of the operation of the state which they can use in a more conventional policy role in Whitehall once they get more senior.

Perhaps the best approach would be for the civil service to start by ringfencing tasks, but actively commission “test and learn” projects to explore more imaginative approaches, and scale those where they work. This could take place alongside implementing more traditional solutions. For example, the civil service has a problem with excess turnover and officials who move between policy areas less frequently would find it easier to develop expertise.

Conclusion

Policy making is among the most important and hardest jobs the civil service does, and improving how it is done is a substantial prize. A policy making process which blends human expertise with LLMs will not just be more efficient, but more insightful and connected to citizens’ concerns.

Channelling the adoption of LLMs in the most productive way possible, maximising the benefits while mitigating the risks, is crucial for the civil service to get right. Just letting change happen should not be an option – it must be proactively shaped.

Jordan Urban is a senior researcher at the Institute for Government.

#Technology #changing #civil #service