ChatGPT Can Now Chat Aloud With You (And Yes, It Sounds Pretty Much Human) Plus, you can now upload photos, and the chatbot can interpret them


OpenAI is expanding the capabilities of its ChatGPT chatbot by introducing support for multimodal input, including images and spoken instructions. This advancement allows the chatbot to have a deeper understanding of the world around it, going beyond the confines of textual data. OpenAI has taken precautions to ensure that these new features do not compromise the safety policies of the company.

The updated version of ChatGPT, with multimodal support, will be made available to ChatGPT Plus and enterprise customers in the coming weeks. Users will have the option to include images in their queries by either taking a new picture or uploading one from their photo library. Additionally, users can now use audio input by pressing a headphone button to speak their prompts and hear the chatbot's responses.

While the current implementation of voice and image support may not be particularly groundbreaking in the context of the ChatGPT app and website, it holds potential for integration into other systems, such as smart speakers or car systems.

For testing purposes, scenarios were explored where the inclusion of an image could enhance the query. For example, a friend wondering if her storage boxes would fit in a specific car posted a photo on Facebook. By including the photo in the query to ChatGPT, the chatbot was able to estimate the size of the contents and verify if they would fit in the car.

Another use case involved asking ChatGPT to suggest recipes based on the contents of a packed refrigerator. However, the chatbot could only offer generic advice since it can only work with a single image and cannot see what is inside the fridge.

Moreover, ChatGPT demonstrated its proficiency in solving math problems and providing strategies for complex questions. Although it can assist with math-related inquiries, it may also provide direct answers without breaking down the problem-solving process.

OpenAI's future plans include enabling ChatGPT to generate images through integration with DALL-E 3. The company is also collaborating with select customers like Spotify to utilize the text-to-speech capabilities, enabling podcasters to translate their content into various languages using their own voice.

As the new features become widely accessible, OpenAI faces the challenge of preventing misuse by millions of users, such as the creation of explicit content or manipulating the chatbot's responses using images.

In summary, the addition of voice and image support in ChatGPT represents a significant step towards more capable AI systems that can understand both abstract concepts and the immediate environment. OpenAI's focus on multimodal support paves the way for a future where AI tools can comprehend and interact with the world around them.  

Getty Images is partnering with Nvidia to launch Generative AI by Getty Images, a new tool that lets people create images using Getty’s library of licensed photos. 

Generative AI by Getty Images (yes, it’s an unwieldy name) is trained only on the vast Getty Images library, including premium content, giving users full copyright indemnification. This means anyone using the tool and publishing the image it created commercially will be legally protected, promises Getty. Getty worked with Nvidia to use its Edify model, available on Nvidia’s generative AI model library Picasso.

Getty CEO Craig Peters will be at this year’s Code conference on September 26th and 27th. You can apply here to attend Code in person or go here for virtual tickets.

I got a hands-on look at Generative AI by Getty Images and got to play around with the tool for a bit. I mainly wanted to see how it generates photos, rather than illustrations, to test out how close to an actual Getty-watermarked picture it can get. And the photos look better than expected. Stock photos already have an artificial, soulless quality to them, and I was not surprised that some of the first few images the tool generated also felt... devoid of feeling. This feeling isn’t exclusive to Getty’s generative AI tool; the photos generated by the upcoming DALL-E 3 from OpenAI made me think the same.

Getty’s tool did well at rendering realistic-feeling human figures. I prompted it to create a photo of a ballerina in an arabesque position (standing on one leg with the other lifted behind) on a stage with a slightly blurred background. The photos I got felt more human than when I tried the same prompt with Stable Diffusion, and the Getty image fooled my friends when I texted it to them. It's clear Getty’s model trained not just on illustrated art but on actual photos. On the other hand, the tool’s illustration mode only gave me 2D, clip-arty renderings of the same prompt.

A screenshot of the Generative AI by Getty Images tool with ballet dancers
Screenshot of the Generative AI by Getty Images tool

The company said any photos created with the tool will not be included in the Getty Images and iStock content libraries. Getty will pay creators if it uses their AI-generated image to train the current and future versions of the model. It will share revenues generated from the tool, “allocating both a pro-rata share in respect of every file and a share based on traditional licensing revenue.”

“We’ve listened to customers about the swift growth of generative AI — and have heard both excitement and hesitation — and tried to be intentional about how we developed our own tool,” says Getty Images chief product officer Grant Farhall in a statement.

The Getty tool does limit what types of images users can generate. It wouldn’t let me create a photo of Joe Biden in front of the White House or a cat in the style of Andy Warhol or Jeff Koons. Any prompt with the name of an actual person was prohibited. Asking for an image of the president of the United States yielded pictures of both men and women, some of whom were people of color, in front of the US flag. The company told The Verge the model “doesn’t know who Andy Warhol, Joe Biden, or any other real-world person is” because it doesn’t want to manipulate or recreate real-life events.

Customers can access Generative AI by Getty Images through the Getty Images website. The company said the tool will be priced separately from a standard Getty Images subscription, and pricing is based on prompt volume. It would not specify prices, however.

Getty says users will get perpetual, worldwide, and unlimited rights to the image they created. (The technical copyright status of AI-generated images, that said, is still fuzzy.) Getty said it is similar to when customers license content from its library, where the company owns the file but licenses it out for use. They can either write their own prompt or use the prompt builder to guide them. Users may also integrate the tool into their own workflows through an API. True to form, Getty watermarks pictures created through the tool, identifying the photo as generated with AI.

It’s no surprise Getty is getting into the AI image game; after all, it has one of the largest libraries of images out there. But the company has battled other text-to-image generative AI developers, suing Stability AI for copyright infringement, alleging its image generator Stable Diffusion used Getty photos without permission. 

By building its own generative AI image platform, Getty can undercut other companies that want to use its image libraries to train models. Getty is far from the only firm setting up AI image platforms with its licensed data. Adobe released its Firefly model, trained on its stable of licensed images, across its Creative Suite and Creative Cloud service. 

The use of copyrighted material to train large language models and text-to-image systems has been a big concern for many in the creative community. Three artists previously sued Stability AI, Midjourney, and art website DeviantArt for using their art without permission to train its models.

Getty said customers can eventually add their own data to train the model and generate images with their brand style. This feature and other services will be available later this year. 

Post a Comment

Previous Post Next Post