Vidhi Jain

email: {first name}{last name} at cmu dot edu


I am a PhD candidate at Carnegie Mellon University, Robotics Institute (RI), where I work closely with Yonatan Bisk. Previously, I have worked at Google DeepMind Robotics, Meta AI and Microsoft Research India.

I am interested in multimodal representations for adaptable embodied AI. My long term vision is to develop robots that can perform multiple tasks around the home and learn new skills from their users. My research focuses at the intersection of language, vision and actions to enhance real-time perception, motion control and dialogue for robots.

Bio |  CV  |  GitHub  |  Google Scholar  |  LinkedIn  |  X

profile photo

News

September 2024,

FlexCap accepted at NeurIPS'24.

FlexCap: Describe Anything in Images in Controllable Detail is accepted at Conference of Neural Information Processing Systems (NeurIPS) 2024.


September 2024,

ANAVI accepted at CoRL'24.

Our work on Audio Noise Awareness using Visuals of Indoor environments for NAVIgation (a.k.a ANAVI) is accepted at Conference of Robot Learning (CoRL) 2024, Munich, Germany.


August 2024,

Visiting Stanford

I am visiting and collaborating with Dorsa Sadigh at Stanford for this semester.


May 2024,

Vid2Robot accepted at RSS'24.

Vid2Robot accepted at Robotics: Science and Systems 2024.
February 2024,

Two papers accepted at ICRA'24.

Two of the following papers led by collaborators Quan and Montse respectively at Google DeepMind got accepted at ICRA 2024 in Yokohama, Japan.
Show more ...
December 2023,

Preprint: Survey on General-Purpose Robots via Foundation Models

We shared the first preprint of our survey on Foundational Models in Robotics.
December 2023,

HomeRobot Challenge Completed

Here are some takeaways from HomeRobot Challenge at NeurIPS 2023.
November 2023,

2 papers and 2 workshop works presented at CoRL'23

I did not attend CoRL this year but check out some of our recent work presented by colleagues at the main conference:

1. HomeRobot: Open-Vocabulary Mobile Manipulation 2. Spatial-Language Attention Policies for Efficient Robot Learning (SLAP)

Also, check out some of the work at LangRob and Robot Learning Workshops.

3. PromptBook leverages LLMs for generating robot code! More than examples that were used in Code-as-Policies, we explore Instructions, Chain of Thought Prompting and State Estimation. Led by Montserrat Gonzalez and Andy Zeng at Google DeepMind Robotics. Here is the paper on OpenReview.

4. Open X-Embodiment is a huge robotics data collection effort to enable training of Robotic Foundational Models across multi-embodiments, different tasks, and different lab setups.
September 2023,

Spatial Language Attention Policies accepted at CoRL'23

How to use few examples to learn manipulation skills? SLAP is a new approach that learns to attend to spatial language to learn manipulation skills.
June 2023,

Started as Student Researcher at Google DeepMind

I am excited to start as a student researcher at Google DeepMind, Mountain View. I will be working with Debidatta Dwibedi on end-to-end video conditioned policy learning for robotics.
June 2023,

HomeRobot Challenge at NeurIPS'23

Check out the HomeRobot, a large-scale sim-to-real mobile manipulation challenge at @NeurIPSConf 2023! More details about the challenge here. You can submit to EvalAI here. Our paper (accepted at CoRL 2023) shows RL and heuristic policies for sim to real transfer and identifies the challenges in the domain.
September 2022,

Transformers Task Planners accepted at CoRL'22

Our work on learning preferences for canonical dish loading task using Transformers Task Planner is accepted at CoRL 2022. Read more here.


June 2022,

Blogpost on AI residents at Meta

I am working with Akshara Rai and Yixin Lin on preference-based task planning for dishwasher loading task. Read about the work of 2021-2022 AI residents at Meta here.




March 2020,

Heidelberg Laureate Forum (HLF)

Selected among 224 young researchers to meet laureates in the mathematics and computer science (postponed to Sep 2021); Participated in Virtual HLF 2020.
August 2019,

Research Fellow at Microsoft Research

I explored deep learning theory for generative models at Microsoft Research India, advised by Amit Deshpande and Navin Goyal.
June 2019,

J. N. Tata Endowment Scholarship for Higher Studies, 2019

Awarded interest-free loan and travel scholarship for higher studies.
May 2019,

K. C. Mahindra Scholarships for Post-Graduate Studies Abroad, 2019

62 students were awarded with interest-free loan for higher studies.
July 2018,

Bachelor Thesis at Mila, University de Montreal

I worked on my bachelor thesis Investigating the viability of Generative Models for Novelty Detection, with Aaron Courville.
July 2017,

Mitacs Globalink Research Internship

I was a Mitacs Globalink Research Intern at Simon Fraser University, Burnaby, Canada. I worked with Prof. Oliver Schulte on bayesian optimization algorithms for machine learning. Find our code here.
March 2017,

Citi Women Leader Award (CWLA) Scholarship

Awarded one year of study scholarship (Top 3 among 1200 candidates selected nationwide).
cwla group pic



Research

project image

ANAVI: Audio Noise Awareness using Visuals of Indoors for NAVIgation



Vidhi Jain, Rishi Veerapaneni, Yonatan Bisk.
8th Annual Conference on Robot Learning (CoRL) 2024.

webpage | arXiv | video | code | reviews | poster | Show BibTeX

project image

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers



Vidhi Jain, Maria Attarian, Nikhil J Joshi Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi.
20th Edition of Robotics Science and Systems (RSS) Conference 2024.

webpage | arXiv | video | Show BibTeX

project image

Towards General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis



Yafei Hu*, Quanting Xie*, Vidhi Jain*, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk.
Preprint 2024.

webpage | arXiv | code | Show BibTeX

project image

FlexCap: Generating Rich, Localized, and Flexible Captions in Images



Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar.
38th Annual Conference on Neural Information Processing Systems (NeurIPS) 2024.

webpage | arXiv | Show BibTeX

project image

How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies



Montserrat Gonzalez Arenas, Ted Xiao, Sumeet Singh, Vidhi Jain, Allen Z. Ren, Quan Vuong, Jacob Varley, Alexander Herzog, Isabel Leal, Sean Kirmani, Mario Prats, Dorsa Sadigh, Vikas Sindhwani, Kanishka Rao, Jacky Liang, Andy Zeng.
40th IEEE International Conference on Robotics and Automation (ICRA) 2023.

arXiv | Show BibTeX

project image

Open X-Embodiment: Robotic Learning Datasets and RT-X Models



Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Chuer Pan, Chuyuan Fu, Coline Devin, Danny Driess, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Federico Ceola, Fei Xia, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Giulio Schiavi, Hao Su, Hao-Shu Fang, Haochen Shi, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jaehyung Kim, Jan Schneider, Jasmine Hsu, Jeannette Bohg, Jeffrey Bingham, Jiajun Wu, Jialin Wu, Jianlan Luo, Jiayuan Gu, Jie Tan, Jihoon Oh, Jitendra Malik, Jonathan Tompson, Jonathan Yang, Joseph J. Lim, João Silvério, Junhyek Han, Kanishka Rao, Karl Pertsch, Karol Hausman, Keegan Go, Keerthana Gopalakrishnan, Ken Goldberg, Kendra Byrne, Kenneth Oslund, Kento Kawaharazuka, Kevin Zhang, Keyvan Majd, Krishan Rana, Krishnan Srinivasan, Lawrence Yunliang Chen, Lerrel Pinto, Liam Tan, Lionel Ott, Lisa Lee, Masayoshi Tomizuka, Maximilian Du, Michael Ahn, Mingtong Zhang, Mingyu Ding, Mohan Kumar Srirama, Mohit Sharma, Moo Jin Kim, Naoaki Kanazawa, Nicklas Hansen, Nicolas Heess, Nikhil J Joshi, Niko Suenderhauf, Norman Di Palo, Nur Muhammad Mahi Shafiullah, Oier Mees, Oliver Kroemer, Pannag R Sanketi, Paul Wohlhart, Peng Xu, Pierre Sermanet, Priya Sundaresan, Quan Vuong, Rafael Rafailov, Ran Tian, Ria Doshi, Roberto Martín-Martín, Russell Mendonca, Rutav Shah, Ryan Hoque, Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, Sherry Moore, Shikhar Bahl, Shivin Dass, Shuran Song, Sichun Xu, Siddhant Haldar, Simeon Adebola, Simon Guist, Soroush Nasiriany, Stefan Schaal, Stefan Welker, Stephen Tian, Sudeep Dasari, Suneel Belkhale, Takayuki Osa, Tatsuya Harada, Tatsuya Matsushima, Ted Xiao, Tianhe Yu, Tianli Ding, Todor Davchev, Tony Z. Zhao, Travis Armstrong, Trevor Darrell, Vidhi Jain, Vincent Vanhoucke, Wei Zhan, Wenxuan Zhou, Wolfram Burgard, Xi Chen, Xiaolong Wang, Xinghao Zhu, Xuanlin Li, Yao Lu, Yevgen Chebotar, Yifan Zhou, Yifeng Zhu, Ying Xu, Yixuan Wang, Yonatan Bisk, Yoonyoung Cho, Youngwoon Lee, Yuchen Cui, Yueh-hua Wu, Yujin Tang, Yuke Zhu, Yunzhu Li, Yusuke Iwasawa, Yutaka Matsuo, Zhuo Xu, Zichen Jeff Cui.
40th IEEE International Conference on Robotics and Automation (ICRA) 2023.

webpage | arXiv | code | Show BibTeX

project image

Spatial Language Attention Policies for Efficient Robot Learning



Priyam Parasher, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers, Yonatan Bisk and Chris Paxton.
7th Annual Conference on Robot Learning (CoRL) 2023.

webpage | arXiv | code | reviews | Show BibTeX

project image

HomeRobot: Open-Vocabulary Mobile Manipulation



Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin S Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander Clegg, John M Turner, Zsolt Kira, Manolis Savva, Angel X Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton.
7th Annual Conference on Robot Learning (CoRL) 2023.

webpage | arXiv | code | reviews | Show BibTeX

project image

Transformers are Adaptable Task Planners



Vidhi Jain, Yixin Lin, Eric Undersander, Yonatan Bisk and Akshara Rai.
6th Annual Conference on Robot Learning (CoRL) 2022.

webpage | arXiv | video | code | reviews | Show BibTeX

project image

MAEA: Multimodal Attribution in Embodied AI



Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, and Yonatan Bisk.
NeurIPS Workshop on Trustworthy Embodied AI 2022.

webpage | arXiv | video | reviews | Show BibTeX

project image

Towards Explainable Embodied AI



Vidhi Jain
Masters thesis 2021.

pdf | Show BibTeX

project image

Learning to capture spatial semantic priors for indoor navigation



Vidhi Jain, Shishir Patil, Prakhar Agarwal and Katia Sycara.
NeurIPS Object Representations for Learning and Reasoning (ORLR) 2020.

pdf | webpage | arXiv | video | code | Show BibTeX

project image

Predicting strategies in simulated search and rescue tasks



Vidhi Jain, Rohit Jena, Huao Li, Tejus Gupta, Dana Hughes, Michael Lewis and Katia Sycara.
NeurIPS AI for Humanitarian Assistance and Disaster Response (AIADR) 2020.

arXiv | video | slides | Show BibTeX

project image

Learning to navigate in unseen cluttered environments



Vidhi Jain, Ganesh Iyer and Katia Sycara.
NeurIPS Women in Machine Learning workshop (WiML) 2020.

pdf | poster | Show BibTeX

project image

Coping with sample inefficiency in deep reinforcement learning



Vidhi Jain, Simin Liu, and Ganesh Iyer.
ICML Women in Machine Learning Un-Workshop (WiML) 2020.

pdf | slides | Show BibTeX

project image

Investigating the viability of Generative Models for Novelty Detection



Vidhi Jain
Bachelors thesis 2018.

pdf | Show BibTeX

project image

Symptomatic Diagnosis and Prognosis of Psychiatric Disorders through Personal Gadgets



Vidhi Jain, Prakhar Agarwal.
ACM CHI Extended Abstracts (CHI EA'17) 2017.

pdf | webpage | slides | poster | Show BibTeX

project image

Model Selection Scores for Multi-Relational Bayesian Networks



Sajjad Gholami, Oliver Schulte, Vidhi Jain, Qiang Zhao.
IJCAI Declarative Learning Based Programming (DeLBP) 2017.

pdf | code | Show BibTeX

project image

Empowering API Consumer Community: Collaborative Annotation of Web API Documentation for Semantically Structured Format



Vidhi Jain and Matthias Frank
Grace Hopper Conference India (GHCI) 2016 2016.

pdf | poster |




Talks

July 2024,

Vid2Robot at Robotics: Science and Systems 2024

Presented the paper on Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers at Robotics Science and Systems (RSS) 2024 at Delft, Netherlands
March 2024,

How to think of multimodality in Embodied AI and Robotic Manipulation?

Guest lecture in Multimodal Machine Learning course for graduate students at Carnegie Mellon University, Pittsburgh.
April 2023,

Training vision models in PyTorch.

Hands-on lecture in Computer Vision course for graduate students at Carnegie Mellon University, Pittsburgh.
Show more ...
December 2022,

MAEA: Multimodal Attribution in Embodied AI

Presented our work on Multimodal Attribution in Embodied AI at NeurIPS 2022 at Trustworthy Embodied AI workshop. Watch the talk here.
March 2021,

AI Symposium organized by SAiDL & APPCAIR

Invited as Early Career Speaker to discuss AI research and suggestions to get started in it. Video

December 2020,

Learning Embeddings that Capture Spatial Semantics for Indoor Navigation.

Presented our work on Predicting Human Strategies in Simulated Search and Rescue at NeurIPS 2020 AI+HADR workshop. Watch the talk here.
December 2020,

Predicting Human Strategies in Simulated Search and Rescue

Presented our work on Predicting Human Strategies in Simulated Search and Rescue at NeurIPS 2020 AI+HADR workshop. Watch the talk here.
April 2019,

Pyladies Bangalore

gave a tutorial on Deep Learning with PyTorch. Resources & Event Poster
April 2019,

Speaker at IIIT-Bangalore ACM Student Chapter

Invited for a information session on internships and research. Video
April 2019,

One in Asankhya Project

Invited for a discussion for one-in-asankhya project. Blogpost & Video
July 2017,

NDTV telecast on Innovation by Young India

Invited speaker for panel discussion on national news NDTV India to present project Automated Psychiatrist. Watch here@5:07.



Education

logo image

Master of Science in Robotics (MSR)

2019-2021
Advisor: Katia Sycara (social machine intelligence)  
Thesis: Towards Explainable Embodied AI
logo image

Bachelors of Engineering (Honors) in Computer Science

2014-2018
Advisor: Aaron Courville, Mila (off-campus thesis)  
Thesis: Investigating viability of generative models for out-of-distribution detection.


Design and source code from Leonid Keselman's Jekyll fork and Jon Barron's website