Business

After ChatGPT, Microsoft working on AI model that takes images as cues

Fri, Mar 03 2023 07:15:46 PM

New Delhi, Mar 3 (IANS): As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.

The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more.

Kosmos-1 can pave the way for the next-stage beyond ChatGPT's text prompts.

"A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions," said Microsoft's AI researchers in a paper.

The paper suggests that multimodal perception, or knowledge acquisition and "grounding" in the real world, is needed to move beyond ChatGPT-like capabilities to artificial general intelligence (AGI), reports ZDNet.

"More importantly, unlocking multimodal input greatly widens the applications of language models to more high-value areas, such as multimodal machine learning, document intelligence, and robotics," the paper read.

The goal is to align perception with LLMs, so that the models are able to see and talk.

Experimental results showed that Kosmos-1 achieves impressive performance on language understanding, generation, and even when directly fed with document images.

It also showed good results in perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks, such as image recognition with descriptions (specifying classification via text instructions).

"We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs," said the team.

Follow Daijiworld News Network on

Latest

Avendus names Karan Sharma as Investment Banking head

RIL shares may see re-rating as retail, AI and green energy drive growth: Jefferies

India, US to hold key trade talks in Delhi ahead of tariff deadline

RBI likely to hold repo rate at 5.25% as easing global tensions support data-driven approach: Report

PhonePe wallet inactivity charges apply only to wallet balance, not bank accounts or UPI

Bharti Airtel leads gains as top firms add Rs 2.15 lakh cr in market value

Hyderabad’s HRV Pharma builds API business without owning a factory, targets Rs 1,000 cr revenue

Business

After ChatGPT, Microsoft working on AI model that takes images as cues

Top Stories

Leave a Comment Your Email address will not be published.

Title: After ChatGPT, Microsoft working on AI model that takes images as cues

You might also like

UK PM Keir Starmer quits in dramatic Downing Street exit as Burnham emerges to seize Labour crown

Alankar murder-suicide: Orphaned toddler takes shelter with grandmother, parents laid to rest

Udupi: Bomb threat email triggers intensive search at district court

St Aloysius becomes second deemed university in state to achieve UGC Category-I status

Mangaluru: Cow falls into well at Pacchanady; fire personnel, locals rescue it

Mangaluru: Bike rams into lorry at Ekkur, 26-year-old rider dies

Kanara Entrepreneurs members' meet explores leadership in the era of AI

IRS officer Ishan Bhatnagar secures AIR 5 in UPSC civil services exam 2025

Mangaluru: 1,227 candidates skip NEET-UG retest in DK; exam held smoothly under tight security

M'luru: Monsoon falters as Coastal K'taka records 54% rainfall deficit; Netravathi runs nearly dry