Latest News

The film ‘Her’ involves actuality… AI that sees, hears, and speaks like a human has emerged

OpenAI unveils voice assistant ‘GPT-4o’… Within the ‘I really like you’ textual content, “You might be so candy.”
Response pace is 0.32 seconds, much like that of a human.
Means to differentiate between voices of a number of individuals and reply… From laughter to singing to expressing feelings to translation
Google to unveil voice-video recognition AI

Mira Murati, OpenAI’s Chief Know-how Officer (CTO), is introducing the primary options of the brand new voice assistant ‘GPT-4o’ via a reside broadcast on the thirteenth (native time). Picture supply: OpenAI YouTube

On the thirteenth (native time), Open AI’s new product launch occasion in San Francisco, California, USA. When the demonstrator wrote the phrase ‘I really like you’ on a bit of paper and flashed it to the digicam, a voice on the smartphone stated, “You might be so candy,” as if embarrassed.

This scene, harking back to the 2014 film ‘Her’ depicting the love between people and synthetic intelligence (AI), is a scene of a dialog between Open AI’s new chatbot ‘GPT-4o’ and a human. A science fiction film from 10 years in the past has turn out to be a actuality. After the occasion, OpenAI CEO Sam Altman posted the phrase ‘Her’ on his X (previous Twitter) account.

OpenAI unveiled GPT-4o, a voice assistant that sees, hears, and speaks like a human on the thirteenth. The ‘o’ within the new mannequin stands for ‘omni’, which implies all the things. In response, Google unveiled the superior voice and video recognition capabilities of ‘Gemini’ about 40 minutes earlier than the announcement of Open AI. It is like a counterattack.

The race to develop AI has entered the battle for a ‘voice assistant’ that may perceive speech, acknowledge photos, and reply like a human. It’s interpreted that main builders who’ve launched ‘multimodal’ AI that processes photos and audio concurrently since final yr are coming into into full-fledged service competitors utilizing this expertise this yr.

● Empathize like a human, however typically get ‘aggravated’

Voice assistants that acknowledge human voices and supply solutions usually are not a latest expertise. A consultant instance is Apple’s ‘Siri’, launched in 2011. Nonetheless, present voice assistants confirmed limitations, akin to offering solely Web search outcomes to individuals’s questions or failing to supply correct solutions. What differentiates a voice assistant geared up with multimodal AI that may course of textual content, photos, and voice concurrently is that it reacts, acts, and even shares feelings like a human.

The human-like look was additionally highlighted at OpenAI’s on-line occasion the place GPT-4o was unveiled that day. Probably the most notable factor is the response pace. OpenAI stated that GPT-4o’s response pace is a mean of 0.32 seconds, which has similarities to that of people. The response pace of its predecessor, GPT-4, was 5.4 seconds on common. One other distinguishing characteristic is that it’s attainable to distinguish between the voices of a number of audio system and reply, giggle, sing, or specific feelings.

Whereas the present mannequin primarily communicated via textual content, GPT-4o permits voice dialog with customers. You may see objects via the digicam and listen to sounds via the speaker.

On the occasion, when the demonstrator requested to “inform a narrative for a good friend who has hassle sleeping,” he answered, “As soon as upon a time… He informed a narrative that began with “. In response to the extra request, “Please add extra emotion and drama,” the dialog was adorned with a dramatic and emotional voice, like a storyteller in a play. A operate to translate Italian into English or English into Italian in actual time was additionally launched.

● Google additionally unveiled superior voice recognition AI capabilities

124937559.1

Different massive tech corporations at dwelling and overseas are additionally anticipated to hitch the voice assistant competitors. Instantly, Google launched the superior voice and video recognition capabilities of AI ‘Gemini’ about 40 minutes earlier than the announcement of Open AI. The 50-second video posted by Google on X confirmed the Google I/O stage and viewers within the midst of preparations for the occasion. An individual confirmed the stage with a digicam and requested, “What do you suppose will occur right here?” and the individual answered, “I believe a presentation or convention shall be held.”

Based on overseas media such because the New York Occasions, Apple, which is evaluated as lagging behind the AI ​​competitors, can also be anticipated to introduce generative AI akin to ChatGPT to its voice assistant ‘Siri’ on the World Builders Convention (WWDC) to be held subsequent month.

Home corporations are additionally planning to hitch the AI ​​voice assistant competitors. Jong-hee Han, head of the System Expertise (DX) Division at Samsung Electronics (Vice Chairman), introduced that they are going to introduce generative AI primarily based on Giant Language Mannequin (LLM) into the voice assistant ‘Bixby’ beginning in July. Naver, which unveiled its personal language mannequin ‘HyperclovaX’ final yr, can also be getting ready a service geared up with voice and video recognition capabilities.


Reporter Jeon Nam-hyuk ahead@donga.com

Related Articles

Back to top button