I used AI work tools to do my job. Here’s how it went. Does AI save you time or create more work? We put Microsoft’s Copilot and Gemini for Google Workspace to the test.

 


Artificial intelligence (AI) tools have revolutionized the way work tasks are performed, from generating images and creating presentations to analyzing data and providing meeting recaps in a matter of seconds. With the convenience of accessing these AI capabilities for a monthly fee of $20 to $30 through Microsoft and Google work tools, the integration of AI into everyday work processes has become more prevalent.

Both Microsoft and Google maintain that their AI tools, such as Microsoft Copilot and Gemini for Google Workspace, aim to address common challenges encountered by workers. These tools are designed to automate tedious tasks, assist individuals in initiating writing tasks, and aid in organization, proofreading, preparation, and content creation.

According to a survey conducted by the Pew Research Center, 34 percent of working U.S. adults believe that AI will have equal positive and negative impacts on their work over the next 20 years. However, a significant 31 percent remain uncertain about the potential effects of AI in the workplace.

In a bid to evaluate the effectiveness of these new AI tools, the Help Desk subjected them to common work tasks. This provided an opportunity to assess their usability and the extent to which they could effectively streamline work processes.  


AI for your inbox

Ideally, AI should speed up catching up on email, right? Not always.

It may help you skim faster, start an email, or elaborate on quick points you want to hit. But it also might make assumptions, get things wrong, or require several attempts before offering the desired result.

Microsoft’s Copilot allows users to choose from several tones and lengths before they start drafting. Users create a prompt for what they want their email to say and then have the AI adjust based on changes they want to see.

While the AI often included desired elements in the response, it also often added statements we didn’t ask for in the prompt when we selected short and casual options. For example, when we asked it to disclose that the email was written by Copilot, it sometimes added marketing comments like calling the tech “cool” or assuming the email was “interesting” or “fascinating.”

When we asked it to make the email less positive, instead of dialing down the enthusiasm, it made the email negative. And if we made too many changes, it lost sight of the original request.

“They hallucinate,” said Ethan Mollick, associate professor at the Wharton School of the University of Pennsylvania, who studies the effects of AI on work. “That’s what AI does — make up details.”

When we used a “direct” tone and short length, the AI produced fewer false assumptions and more desired results. But a few times, it returned an error message suggesting that the prompt had content Copilot couldn’t work with.

Using copilot for email isn't perfect. Some prompts were returned with an error message. (Video: The Washington Post)

If we entirely depended on the AI, versus making major manual edits to the suggestions, getting a fitting response often took multiple if not several tries. Even then, one colleague responded to an AI-generated email with a simple response to the awkwardness: “LOL.”

“We called it Copilot for a reason,” said Colette Stallbaumer, general manager of Microsoft 365 and future of work marketing. “It’s not autopilot.”

Google’s Gemini has fewer options for drafting emails, allowing users to elaborate, formalize, or shorten. However, it made fewer assumptions and often stuck solely to what was in the prompt. That said, it still sometimes sounded robotic.

Copilot can also summarize emails, which can quickly help you catch up on a long email thread or cut through your wordy co-worker’s mini-novel, and it offers clickable citations. But it sometimes highlighted less relevant points, like reminding me of my own title listed in my signature.

Documents and data

The AI seemed to do better when it was fed documents or data. But it still sometimes made things up, returned error messages, or didn’t understand the context.

We asked Copilot to use a document full of reporter notes, which are admittedly filled with shorthand, fragments, and run-on sentences and asked it to write a report. At first glance, the result seemed convincing that the AI had made sense of the messy notes. But with closer inspection, it was unclear if anything actually came from the document, as the conclusions were broad, overreaching, and not cited.

“If you give it a document to work off, it can use that as a basis,” Mollick said. It may “hallucinate less but in more subtle ways that are harder to identify.”

When we asked it to continue a story we started writing, providing it a document filled with notes, it summarized what we had already written and produced some additional paragraphs. But, it became clear much of it was not from the provided document.

“Fundamentally, they are speculative algorithms,” said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management, who studies AI’s impact on work. “They don’t understand like humans do. They provide the statistically likely answer.”

Summarizations were less problematic, and the clickable citations made it easy to confirm each point. Copilot was also helpful in editing documents, often catching acronyms that should be spelled out, punctuation, or conciseness, much like a beefed-up spell check.

With spreadsheets, the AI can be a little tricky, and you need to convert data to a table format first. Copilot more accurately produced responses to questions about tables with simple formats. But for larger spreadsheets that had categories and subcategories or other complex breakdowns, we couldn’t get it to find relevant information or accurately identify the trends or takeaways.

Copilot wasn’t able to identify the information in the given spreadsheet. (Washington Post illustration; Danielle Abril/The Washington Post)

Meetings and presentations

Microsoft says one of the users’ top places to use Copilot is in Teams, the collaboration app that offers tools including chat and video meetings. Our test showed the tool can be helpful for quick meeting notes, questions about specific details, and even a few tips on making your meetings better. But typical of other meeting AI tools, the transcript isn’t perfect.

First, users should know that their administrator has to enable transcriptions so Copilot can interact with the transcript during and after the meeting — something we initially missed. Then, in the meeting or afterward, users can use Copilot to ask questions about the meeting. We asked for unanswered questions, action items, a meeting recap, specific details, and how we could’ve made the meeting more efficient. It can also pull up video clips that correspond to specific answers if you record the meeting.

The AI was able to recall several details, accurately list action items and unanswered questions, and give a recap with citations to the transcript. Some of its answers were a little muddled, like when it confused the name of a place with the location and ended up with something that looked a little like word salad. It was able to identify the tone of the meeting (friendly and casual with jokes and banter) and censored curse words with asterisks. And it provided advice for more efficient meetings: For us that meant creating a meeting agenda and reducing the “small talk and jokes” that took the conversation off topic.

Copilot can be used during a Teams meeting and produce transcriptions, action items, and meeting recaps. (Video: The Washington Post)

Copilot can also help users make a PowerPoint presentation, complete with title pages and corresponding images, based on a document in a matter of seconds. But that doesn’t mean you should use the presentation as is.

A document’s organization and format seem to play a role in the result. In one instance, Copilot created an agenda with random words and dates from the document. Other times, it made a slide with just a person’s name and responsibility. But it did better documents with clear formats (think an intro and subsections).

Google's Gemini can generate images like this robot. (Video: The Washington Post)

While Copilot’s image generation for slides was usually related, sometimes its interpretation was too literal. Google’s Gemini also can help create slides and generate images, though more often than not when trying to create images, we received a message that said, “For now we’re showing limited results for people. Try something else.”

Should you use it?

AI can aid with idea generation, drafting from a blank page, or quickly finding a specific item. It also may help catch up on emails, and meetings and summarize long conversations or documents. Another nifty tip? Copilot can gather the latest chats, emails, and documents you’ve worked on with your boss before your next meeting together.

But all results and content need a careful inspection for accuracy, some tweaking or deep edits — and both tech companies advise users to verify everything generated by the AI. “I don’t want people to abdicate responsibility,” said Kristina Behr, vice president of product management for collaboration apps at Google Workspace. “This helps you do your job. It doesn’t do your job.”

As is the case with AI, the more details and direction in the prompt, the better the output. So as you do each task, you may want to consider whether AI will save you time or actually create more work.

“The work it takes to generate outcomes like text and videos has decreased,” Rahman said. “But the work to verify has significantly increased.”

Post a Comment

Previous Post Next Post