top of page
Search

Fei-Fei Li’s new work:Thinking in Space

Writer's picture: Info TKEthicsInfo TKEthics

Humans memorize and reason about spaces through continuous visual observation. It's called “visuospatial intelligence” or "Spatial awareness". However, can multimodal large language models (MLLMs), trained on video datasets at a million-scale, also perform “spatial reasoning” from videos?


On Dec 23rd, the research team led by Saining Xie, Assistant Professor of Computer Science at New York University, in collaboration with the “AI Pioneer” and Stanford University’s first Sequoia Chair Professor Fei-Fei Li, along with Yale University undergraduate in Computer Science and Economics Rilyn Han, published a study titled “Thinking in Space”. This research explores how MLLMs perceive, memorize, and recall spatial information.


They discovered that even spatial reasoning remains a major bottleneck for MLLMs to achieve higher benchmark performance, localized world models and spatial awareness have indeed emerged within these models.


The published paper can be found at https://arxiv.org/pdf/2412.14171. In this paper, you'll find the investigations about whether Multimodal Large Language Models (MLLMs), trained on massive video datasets, can perform “spatial reasoning” akin to human visuospatial intelligence. It introduces a novel benchmark, VSI-Bench, designed to evaluate the spatial reasoning capabilities of MLLMs across various tasks.


The study highlights the limitations and potential of MLLMs in spatial reasoning. While these models exhibit emergent visuospatial capabilities, achieving human-level spatial intelligence requires further advancements in cognitive mapping and reasoning techniques. The VSI-Bench provides a robust foundation for future research in this domain.


Overall, Fei-Fei Li and the team behind this groundbreaking paper have opened a new horizon in artificial intelligence, showing us yet again the immense possibilities that lie ahead.

1 view0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page