Google fires shots at custom GPTs with Gemini Gems

Google on Tuesday rolled out a retooled search engine that will frequently favor responses crafted by artificial intelligence over website links, a shift promising to quicken the quest for information while also potentially disrupting the flow of money-making internet traffic.

The makeover announced at Google’s annual developers conference will begin this week in the U.S. when hundreds of millions of people will start to periodically see conversational summaries generated by the company’s AI technology at the top of the search engine’s results page.

The AI overviews are supposed to only crop up when Google’s technology determines they will be the quickest and most effective way to satisfy a user’s curiosity — a solution mostly likely to happen with complex subjects or when people are brainstorming, or planning. People will likely still see Google’s traditional website links and ads for simple searches for things like store recommendations or weather forecasts.

Google began testing AI overviews with a small subset of selected users a year ago, but the company is now making it one of the staples in its search results in the U.S. before introducing the feature in other parts of the world. By the end of the year, Google expects the recurring AI overviews to be part of its search results for about 1 billion people.

Besides infusing more AI into its dominant search engine, Google also used the packed conference held at a Mountain View, California, amphitheater near its headquarters to showcase advances in a technology that is reshaping business and society.

The next AI steps included more sophisticated analysis powered by Gemini — a technology unveiled five months ago — and smarter assistants, or “agents,” including a still-nascent version dubbed “Astra” that will be able to understand, explain, and remember things it is shown through a smartphone’s camera lens. Google underscored its commitment to AI by bringing in Demis Hassabis, the executive who oversees the technology, to appear on stage at its marquee conference for the first time.

The injection of more AI into Google’s search engine marks one of the most dramatic changes that the company has made in its foundation since its inception in the late 1990s. It’s a move that opens the door for more growth and innovation but also threatens to trigger a sea change in web surfing habits.

“This bold and responsible approach is fundamental to delivering on our mission and making AI more helpful for everyone,” Google CEO Sundar Pichai told a group of reporters.

It also will bring new risks to an internet ecosystem that depends heavily on digital advertising as its financial lifeblood.

Google stands to suffer if the AI overviews undercuts ads tied to its search engine — a business that reeled in $175 billion in revenue last year alone. Website publishers — ranging from major media outlets to entrepreneurs and startups that focus on more narrow subjects — will be hurt if the AI overviews are so informative that they result in fewer clicks on the website links that will still appear lower on the results page.

Based on habits that emerged during the past year’s testing phase of Google’s AI overviews, about 25% of the traffic could be negatively affected by the de-emphasis on website links, said Marc McCollum, chief innovation officer at Raptive, which helps about 5,000 website publishers make money from their content.

A decline in traffic of that magnitude could translate into billions of dollars in lost ad revenue, a devastating blow that would be delivered by a form of AI technology that culls information plucked from many of the websites that stand to lose revenue.

“The relationship between Google and publishers has been pretty symbiotic, but enter AI, and what has essentially happened is the Big Tech companies have taken this creative content and used it to train their AI models,” McCollum said. “We are now seeing that being used for their own commercial purposes in what is effectively a transfer of wealth from small, independent businesses to Big Tech.”

But Google found the AI overviews resulted in people in conducting even more searches during the technology’s testing “because they suddenly can ask questions that were too hard before,” said Liz Reid, who oversees the company’s search operations, told The Associated Press during an interview. She declined to provide any specific numbers about link-clicking volume during the tests of AI overviews.

“In reality, people do want to click to the web, even when they have an AI overview,” Reid said. “They start with the AI overview and then they want to dig in deeper. We will continue to innovate on the AI overview and also on how we send the most useful traffic to the web.”

The increasing use of AI technology to summarize information in chatbots such as Google’s Gemini and OpenAI’s ChatGPT during the past 18 months already has been raising legal questions about whether the companies behind the services are illegally pulling from copyrighted material to advance their services. It’s an allegation at the heart of a high-profile lawsuit that The New York Times filed late last year against OpenAI and its biggest backer, Microsoft.

Google’s AI overviews could provoke lawsuits too, especially if they siphon away traffic and ad sales from websites that believe the company is unfairly profiting from their content. But it’s a risk that the company had to take as the technology advances and is used in rival services such as ChatGPT and upstart search engines such as Perplexity, said Jim Yu, executive chairman of BrightEdge, which helps websites rank higher in Google’s search results.

“This is definitely the next chapter in search,” Yu said. “It’s almost like they are tuning three major variables at once: the search quality, the flow of traffic in the ecosystem, and then the monetization of that traffic. There hasn’t been a moment in search that is bigger than this for a long time.”

At I/O 2024, Google’s teaser for Project Astra gave us a glimpse at where AI assistants are going in the future. It’s a multi-modal feature that combines the smarts of Gemini with the kind of image recognition abilities you get in Google Lens, as well as powerful natural language responses. However, while the promo video was slick, after getting to try it out in person, it's clear there’s a long way to go before something like Astra lands on your phone. So here are three takeaways from our first experience with Google’s next-gen AI.

Sam’s take:

Currently, most people interact with digital assistants using their voice, so right away Astra’s multi-modality (i.e. using sight and sound in addition to text/speech) to communicate with an AI is relatively novel. In theory, it allows computer-based entities to work and behave more like a real assistant or agent – which was one of Google’s big buzzwords for the show – instead of something more robotic that simply responds to spoken commands.

The first project Astra demo we tried used a large touchscreen connected to a downward-facing camera. — Photo by Sam Rutherford/Engadget

In our demo, we had the option of asking Astra to tell a story based on some objects we placed in front of the camera, after which it told us a lovely tale about a dinosaur and its trusty baguette trying to escape an ominous red light. It was fun the tale was cute, and the AI worked about as well as you would expect. But at the same time, it was far from the seemingly all-knowing assistant we saw in Google's teaser. And aside from maybe entertaining a child with an original bedtime story, it didn’t feel like Astra was doing as much with the info as you might want.

Then my colleague Karissa drew a bucolic scene on a touchscreen, at which point Astra correctly identified the flower and sun she painted. But the most engaging demo was when we circled back for a second go with Astra running on a Pixel 8 Pro. This allowed us to point its cameras at a collection of objects while it tracked and remembered each one’s location. It was even smart enough to recognize my clothing and where I had stashed my sunglasses even though these objects were not originally part of the demo.

In some ways, our experience highlighted the potential highs and lows of AI. Just the ability for a digital assistant to tell you where you might have left your keys or how many apples were in your fruit bowl before you left for the grocery store could help you save some real-time. But after talking to some of the researchers behind Astra, there are still a lot of hurdles to overcome.

An AI-generated story about a dinosaur and a baguette created by Google's Project Astra — Photo by Sam Rutherford/Engadget

Unlike a lot of Google’s recent AI features, Astra (which is described by Google as a “research preview”) still needs help from the cloud instead of being able to run on-device. While it does support some level of object permanence, those “memories” only last for a single session, which currently only spans a few minutes. And even if Astra could remember things for longer, there are things like storage and latency to consider, because for every object Astra recalls, you risk slowing down the AI, resulting in a more stilted experience. So while it’s clear Astra has a lot of potential, my excitement was weighed down with the knowledge that it will be some time before we can get more full-feature functionality.

Karissa’s take:

Of all the generative AI advancements, multimodal AI has been the one I’m most intrigued by. As powerful as the latest models are, I have a hard time getting excited for iterative updates to text-based chatbots. But the idea of AI that can recognize and respond to queries about your surroundings in real-time feels like something out of a sci-fi movie. It also gives a much clearer sense of how the latest wave of AI advancements will find their way into new devices like smart glasses.

Google offered a hint of that with Project Astra, which may one day have a glasses component, but for now is mostly experimental (the glasses shown in the demo video during the I/O keynote were apparently a “research prototype.”) In person, though, Project Astra didn’t exactly feel like something out of sci-fi flick.

During a demo at Google I/O, Project Astra was able to remember the position of objects seen by a phone's camera. — Photo by Sam Rutherford/Engadget

It was able to accurately recognize objects that had been placed around the room and respond to nuanced questions about them, like “Which of these toys should a 2-year-old play with.” It could recognize what was in my doodle and makeup stories about different toys we showed it.

But most of Astra’s capabilities seemed on par with what Meta has already made available with its smart glasses. Meta’s multimodal AI can also recognize your surroundings and do a bit of creative writing on your behalf. And while Meta also bills the features as experimental, they are at least broadly available.

The Astra feature that may set Google’s approach apart is the fact that it has a built-in “memory.” After scanning a bunch of objects, it could still “remember” where specific items were placed. For now, it seems Astra’s memory is limited to a relatively short window of time, but members of the research team told us that it could theoretically be expanded. That would obviously open up even more possibilities for the tech, making Astra seem more like an actual assistant. I don’t need to know where I left my glasses 30 seconds ago, but if you could remember where I left them last night, that would actually feel like sci-fi come to life.

But, like so much of generative AI, the most exciting possibilities are the ones that haven’t quite happened yet. Astra might get there eventually, but right now it feels like Google still has a lot of work to do to get there.

When Google first showcased its Duplex voice assistant technology at its developer conference in 2018, it was both impressive and concerning. Today, at I/O 2024, the company may be bringing up those same reactions again, this time by showing off another application of its AI smarts with something called Project Astra.
The company couldn't even wait till its keynote today to tease Project Astra, posting a video to its social media of a camera-based AI app yesterday. At its keynote today, though, Google's DeepMind CEO Demis Hassabis shared that his team has "always wanted to develop universal AI agents that can be helpful in everyday life." Project Astra is the result of progress on that front.
What is Project Astra?
According to a video that Google showed during a media briefing yesterday, Project Astra appeared to be an app that has a viewfinder as its main interface. A person holding up a phone pointed its camera at various parts of an office and verbally said "Tell me when you see something that makes sound." When a speaker next to a monitor came into view, Gemini responded "I see a speaker, which makes sound."
The person behind the phone stopped and drew an onscreen arrow to the top circle on the speaker and said, "What is that part of the speaker called?" Gemini promptly responded "That is the tweeter. It produces high-frequency sounds."
Then, in the video that Google said was recorded in a single take, the tester moved over to a cup of crayons further down the table and asked "Give me a creative alliteration about these," to which Gemini said "Creative crayons color cheerfully. They certainly craft colorful creations."
Wait, were those Project Astra glasses? Is Google Glass back?
The rest of the video goes on to show Gemini in Project Astra identifying and explaining parts of code on a monitor, telling the user what neighborhood they were in based on the view out the window. Most impressively, Astra was able to answer "Do you remember where you saw my glasses?" even though said glasses were completely out of frame and were not previously pointed out. "Yes, I do," Gemini said, adding "Your glasses were on a desk near a red apple."
After Astra located those glasses, the tester put them on and the video shifted to the perspective of what you'd see on the wearable. Using a camera onboard, the glasses scanned the wearer's surroundings to see things like a diagram on a whiteboard. The person in the video then asked "What can I add here to make this system faster?" As they spoke, an onscreen waveform moved to indicate it was listening, and as it responded, text captions appeared in tandem. Astra said, "Adding a cache between the server and database could improve speed."
The tester then looked over to a pair of cats doodled on the board and asked "What does this remind you of?" Astra said "Schrodinger's cat." Finally, they picked up a plush tiger toy, put it next to a cute golden retriever, and asked for "a band name for this duo." Astra dutifully replied "Golden stripes."
How does Project Astra work?
This means that not only was Astra processing visual data in real time, but it was also remembering what it saw and working with an impressive backlog of stored information. This was achieved, according to Hassabis, because these "agents" were "designed to process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall."
It was also worth noting that, at least in the video, Astra was responding quickly. Hassabis noted in a blog post that "While we’ve made incredible progress developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge."
Google has also been working on giving its AI more range of vocal expression, using its speech models to "enhance how they sound, giving the agents a wider range of intonations." This sort of mimicry of human expressiveness in responses is reminiscent of Duplex's pauses and utterances that led people to think Google's AI might be a candidate for the Turing test.
When will Project Astra be available?
While Astra remains an early feature with no discernible plans for launch, Hassabis wrote that in the future, these assistants could be available "through your phone or glasses." No word yet on whether those glasses are actually a product or the successor to Google Glass, but Hassabis did write that "some of these capabilities are coming to Google products, like the Gemini app, later this year."