Book Online or Call 1-855-SAUSALITO

Sign In  |  Register  |  About Sausalito  |  Contact Us

Sausalito, CA
September 01, 2020 1:41pm
7-Day Forecast | Traffic
  • Search Hotels in Sausalito

  • CHECK-IN:
  • CHECK-OUT:
  • ROOMS:

DreamVu Publishes PRISM: A Multi-View Retail Video Dataset for Embodied AI Research

270,000-sample dataset covering spatial, physical, and embodied action reasoning reduces error rates by 66.6% on 20 capability probes; 100K open subset and fine-tuned model weights released on Hugging Face

DreamVu today released PRISM, a 270,000-sample multi-view video dataset collected across five real supermarkets for training and evaluating vision-language models on embodied AI tasks. Fine-tuning on PRISM reduces average error rates by 66.6% and cuts embodied reasoning errors by a factor of five compared to general-purpose baselines, across 20 capability probes evaluated in the accompanying paper.

Existing training datasets typically address spatial, physical, or action-level reasoning in isolation. PRISM covers all three in a single deployment domain, captured from both egocentric (worker-worn) and wide-angle 360° overhead cameras. Annotations were generated using LLM-produced chain-of-thought reasoning; the paper finds this format produces larger accuracy gains than template-based labeling, particularly on spatial and causal tasks. Fourteen of the 20 capability probes are not in any prior publicly available AI training corpus — the first dataset to cover all three reasoning dimensions simultaneously in a real deployment environment.

A data-scaling analysis shows that 60% of the corpus (162,000 samples) achieves 87.7% average accuracy — within 1.2 percentage points of the full-dataset ceiling — meaning strong results are attainable without training on the full corpus. Mixing egocentric and exocentric data improves cross-view performance without degrading egocentric task accuracy; the two camera perspectives are complementary rather than competitive.

“The core finding is that domain-specific fine-tuning on data covering spatial, physical, and action reasoning together produces gains that general-corpus scaling does not. We’re releasing the dataset and model weights so the research community can build on it.”
— Rajat Aggarwal, Co-Founder and CEO, DreamVu

The paper is at dreamvu.ai/prism (arXiv forthcoming). The 100,000-sample open subset and fine-tuned model weights (Cosmos-Reason2-2B-Retail-Grocery-EgoExo) are on Hugging Face at huggingface.co/datasets/DreamVu/PRISM-100K. The full 270,000-sample corpus is available under commercial license at sales@dreamvu.ai.

About DreamVu: DreamVu is a physical AI data infrastructure company. Its proprietary ALIA 360° omnidirectional camera system and multi-view capture infrastructure are used to build training datasets for embodied AI systems in retail, logistics, healthcare, and industrial environments. DreamVu is headquartered in Philadelphia, PA, with R&D in Hyderabad, India, and is a member of the NVIDIA Inception program.

The core finding is that domain-specific fine-tuning on data covering spatial, physical, and action reasoning together produces gains that general-corpus scaling does not.

Contacts

Recent Quotes

View More
Symbol Price Change (%)
AMZN  221.25
+7.48 (3.50%)
AAPL  258.90
+5.40 (2.13%)
AMD  231.82
+10.29 (4.64%)
BAC  51.88
+1.60 (3.18%)
GOOG  314.74
+10.81 (3.56%)
META  612.42
+37.37 (6.50%)
MSFT  374.33
+2.04 (0.55%)
NVDA  182.08
+3.98 (2.23%)
ORCL  143.66
+0.49 (0.34%)
TSLA  343.25
-3.40 (-0.98%)
Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the Privacy Policy and Terms Of Service.
 
 
Photos copyright by Jay Graham Photographer
Copyright © 2010-2020 Sausalito.com & California Media Partners, LLC. All rights reserved.