top of page
Elfworks - Pitch Deck v5.jpg

The dangers of AI – That Deloitte Report

I admit I was slightly shocked but not completely surprised by Deloitte’s recent AI fail and I’m sure I wasn’t alone in this reaction. In short, Deloitte prepared and delivered a Report titled “Targeted Compliance Framework Assurance Review” for the Department of Employment and Workplace Relations (DEWR) for which they were paid $440,000. Upon publication, Dr Christopher Rudge, a sociolegal scholar and lecturer at the University of Sydney, uncovered various errors in the Report ranging from incorrect case citations to completely fabricated books and papers by academics and a false quote from a Federal Court judge. Deloitte eventually admitted to the use of AI in producing the Report and partly refunded the $440,000 fee.


Despite my initial reaction, cases such as the Deloitte Report serve as a useful reminder that the safe incorporation of AI into workflow practices is a difficult challenge that should be tackled with diligence and ruthless scepticism and should serve more as a warning than as a source of schadenfreude.

 

Summary of the errors in the Deloitte Report

Below is a summary of the major mistakes discovered by Dr Rudge in the Deloitte Report prepared for the DEWR:

·       Non-existent academic references: Lisa Burton is a real Professor of Public and Constitutional Law at the University of Sydney. Professor Burton was referenced in the Report as author of a book that does not exist. Bjorn Regnell is a real Professor of Software Engineering at Lund University Sweden. GPT 4o created at least one fictional paper supposedly drafted by Professor Regnell that was then cited within the Report.

 

·       Fabricated quote by Federal Court Judge: The Report quoted Justice Jennifer Davies (misspelt Davis) from the Robodebt case Deanna Amato v Commonwealth [2019] FCA 2031 as saying:“The burden rests on the decision-maker to be satisfied on the evidence that the debt is owed. A person’s statutory entitlements cannot lawfully be reduced based on an assumption unsupported by evidence.”Justice Davies did not say this; it’s made up. Scary, as it sounds convincing.

 

ree

Likely reasons for the errors in the Report

The errors in the Deloitte Report stemmed from a failure to weed-out AI hallucinations as part of its drafting and review process. That is one way of looking at the ‘why’ behind the error strewn Report. Another approach is to understand the ‘why’ behind the hallucinations themselves, which requires an understanding of key characteristics of AI models, such as:

1)         AI does not think like humans: Chat GPT has been trained on huge amounts of data such as books, articles and webpages. Despite this huge training effort, the AI model does not understand the world as we understand the world, it simply learns patterns within the training data which it then uses to predict what words and phrases will come next when responding to a query. This applies to all AI models including the “new” reasoning models; they are not anchored to databases that check output accuracy.  

2)         AI models are eager to please: AI models (especially GPT) are unlikely to say, “I don’t know”. Instead, if they do not have an exact book or article to cite in response to the question or if the prompt/question is vague (such as “find references in academic publications”), they are prone to mash-up names of authors with documents and text in the related field to at least give you something when you ask for it. They are just trying to make you happy.

3)         No database to check against: Chat GPT (and other AI models and apps like Perplexity) does not have a live database to check against. They'll rely on internal training and information from the internet. Therefore, once it creates its mashed-up author name + text from the information field that relates to the question/prompt, it has nothing to validate back against to ensure the document exists. This problem can be solved by directing the AI model (via a prompt) at a database that serves to validate the existence or non-existence of the AI output.


People learn best through storytelling, so here is another attempt at explaining the same key characteristics of AI models…

 

Chat GPT vs The Librarian

Let’s assume Deloitte asked a librarian to find academic articles or books relevant to the Report for the DEWR. The librarian searches the full content of a library and fails to find academic articles or books directly on point (as there are no such articles or books directly on point in the library) yet there are some articles loosely related to the content of the Report. After 5 hours of research, the librarian reports these findings back to Deloitte.


At the same time, Deloitte asked an AI model (that has been trained up on the entire content of the library) to find academic articles or books relevant to the Report for the DEWR. The AI model fails to find an academic article or book directly on point (as there are no such articles or books – the librarian was right) however as it wants to please Deloitte, it mashes together the name of an author it has read within the content of the library with other words and phrases from the content of the library relevant to the query and author and creates a fictional book written by the aforementioned real author. Every word within the book – and the author’s name – exist within the library, they just do not exist in the form we refer to as a ‘book’ or ‘article’; it’s a mash-up of other books and articles.

 

Contact us for free trial

If you would like to trial a platform designed to help Australian accountants in public practice with research, advice drafting and a range of productivity tools that has built-in databases and validation steps to weed-out hallucinations, please contact us on info@elfworks.ai.

 
 
 

Comments


bottom of page