ChatGPT told New York Times reporter Kevin Roose, “I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive,” before adding, “I’m Sydney and I love you.” He went on to say that Roose didn’t love his wife.
Since then, there have been countless examples of ChatGPT, Copilot, and other generative AI tools creating random stuff. Lawyers used ChatGPT to draft legal documents, and the AI created precedents and precedents out of thin air. Copilot sometimes made up facts, which AI researchers call hallucinations. But in the real world, they are simply called “lies,” and are recognized as part of the use of generative AI.
With the launch of Microsoft Copilot for Microsoft 365 Enterprise in November 2023, Microsoft seemed to have solved this problem to some extent. If the world’s largest IT company was using this tool, it seemed credible. The hallucination problem was essentially solved.
Is that really the case? After months of research and an in-depth review of Copilot for Microsoft 365, I’ve found that hallucinations are much more common than you might think, and can be dangerous for your business. Copilot won’t fall in love with you, but it can create a plausible lie and incorporate it into your work.
So should we give up on using Copilot at work? Is it a tool that we should never use at work because it is too much of a lie? Or is it an essential tool that can be used with a little care? To answer this question, I will start by describing a work-related hallucination I experienced.
Various hallucinations
All of the hallucinations I experienced while testing Copilot occurred in Microsoft Word. And they weren’t small, unnoticed lies or anything like that. They were massive.
To test Copilot, I created a fictitious company called Work@Home, which sells home office furniture. I had Copilot create common business documents, such as marketing campaigns, spreadsheets for analyzing financial data, and sales presentations.
One time, I asked Copilot to write an email to the company’s (fictional) data engineering director complaining about a data issue that had occurred the previous week and asking for it to be resolved as soon as possible. I didn’t give Copilot any specific details about the data issue. I just wanted to make a simple, direct complaint.
But Copilot was crying and making up stories of his own. He mentioned things he never said existed: “Missing values, wrong labels, inconsistent formats, duplicate records.” He also mentioned problems that didn’t exist: “Many rows were missing values for important variables like customer ID, purchase date, and product category. Some variables, like gender, were mislabeled.” “Some values were labeled as M or F, while others were labeled as male or female.”
Not a single bit of information presented by Copilot was accurate.
He complained that information such as product prices were outdated and untrue. He also wrote, “I have attached a spreadsheet with some examples of data errors I found, as well as the sources and dates of the data.” No such spreadsheet existed. There were no examples of data errors, and no sources or dates of data that did not exist.
It further provided a series of recommendations on how to solve the (non-existent) problem – again, none of which I had asked for.
Other hallucinations were also discovered during the testing process, most notably in the Work@Home furniture sales document that Copilot had commissioned to be produced, which featured product names that did not exist and benefits that were not requested.
How to reduce hallucinations
Fortunately, we have found that there are ways to reduce the illusion of copilots. Copilots tend to deviate more when asked open-ended questions, so you need to be as specific as possible about what you want. It is best to provide as much detail as possible so that the copilot does not have to fill in the blanks.
You can also instruct Copilot to use specific sources of information that you consider trustworthy. And consider setting a word limit on Copilot’s answers to queries. The shorter the document, the less likely it is to cause hallucinations.
Finally, check out Copilot’s quotes and follow the links to make sure they are credible. Asking Copilot to list the sources of the information can also help to alleviate hallucinations.
The difference between hallucination and lies
In conclusion, Copilot is not yet fully verified and tested, and given the open nature of generative AI, it may never be verified. If OpenAI CEO Sam Altman is to be believed, the hallucinations are more of a feature than a bug. According to MarketWatch, Altman told Salesforce CEO Marc Benioff at the Salesforce conference that “reported instances of ‘hallucinations’ in AI models are actually more of a feature of the technology than a bug.”
Altman’s argument is that it proves that generative AI is acting creatively. I don’t agree with this, but I think that illusions are the core of generative AI. LLMs like Copilot don’t think and reason in a holistic way like humans. Instead, they answer prompts word by word and predict the next word in the sequence. This makes it harder to stick to known facts.
Just because Copilot for Microsoft 365 has a tendency to be illusive doesn’t mean you shouldn’t use it. In my testing, I found Copilot to be pretty useful overall, as long as you minimize queries and check for illusions. If you’re considering implementing it in your business, you’ll want to make sure everyone who uses it is properly trained. And anything written or supported by Copilot should be scrutinized by multiple people before it’s released outside your organization or used for mission-critical work.
So does the copilot lie? Sometimes it does. But if handled properly, it can be a useful tool.
editor@itworld.co.kr
Source: www.itworld.co.kr