Vidhi Jain

email: {first name}{last name} at cmu dot edu

I am a PhD candidate at Carnegie Mellon University, Robotics Institute (RI), where I work closely with Yonatan Bisk. Previously, I have worked at Google DeepMind Robotics, Meta AI and Microsoft Research India.

I am interested in multimodal representations for adaptable embodied AI. My long term vision is to develop robots that can perform multiple tasks around the home and learn new skills from their users. My research focuses at the intersection of language, vision and actions to enhance real-time perception, motion control and dialogue for robots.

Bio | CV | GitHub | Google Scholar | LinkedIn | X

News

September 2024,

FlexCap accepted at NeurIPS'24.

FlexCap: Describe Anything in Images in Controllable Detail is accepted at Conference of Neural Information Processing Systems (NeurIPS) 2024.

Can we train a model to describe different parts of images in varying levels of detail?

Introducing FlexCap, a VLM designed to output localized captions in N words where we can control N with special length tokens.https://t.co/tDsyHF1AVI pic.twitter.com/6ObqJ98ZJX
— Debidatta Dwibedi (@debidatta) March 19, 2024

September 2024,

ANAVI accepted at CoRL'24.

Our work on Audio Noise Awareness using Visuals of Indoor environments for NAVIgation (a.k.a ANAVI) is accepted at Conference of Robot Learning (CoRL) 2024, Munich, Germany.

🧵1/8 So annoying when my 🤖 vacuum cleaner buzzes loudly during my Zoom meeting! Can we teach robots to be aware of their noise levels at home? Introducing ANAVI—a framework that uses indoor visuals to predict sound propagation! 🎶🏠 pic.twitter.com/gKIDqdhF3G
— Vidhi Jain (@viddivj) October 24, 2024

August 2024,

Visiting Stanford

I am visiting and collaborating with Dorsa Sadigh at Stanford for this semester.

May 2024,

Vid2Robot accepted at RSS'24.

Vid2Robot accepted at Robotics: Science and Systems 2024.

Vid2Robot, a novel end-to-end conditioned robot policy with Cross Attention.
What if we could show a robot how to do a task?

We present Vid2Robot, which is a robot policy trained to decode human intent from visual cues and translate it into actions in its environment. 🤖

Website: https://t.co/ufFHK1Dgbg
Arxiv: https://t.co/qEUjaXovJa

🧵(1/n) pic.twitter.com/13pgW8ssEY
— Vidhi Jain (@viddivj) March 20, 2024

February 2024,

Two papers accepted at ICRA'24.

Two of the following papers led by collaborators Quan and Montse respectively at Google DeepMind got accepted at ICRA 2024 in Yokohama, Japan.

(1) RT-X introduces a massive multi-institution collaboration on a exploring robotic datasets and policies covering many robot embodiments. The open-sourced datasets enable generalist policies to control many robots across many academic labs. https://t.co/2MZ0pTvM8j
— Ted Xiao @ ICRA 2024 (@xiao_ted) January 29, 2024

(2) PromptBook extends our prior investigations on leveraging LLMs for generating robot code. Tons of important details about scaling up Code as Policies for robotics.

Check out @montseglz's talk on Tuesday!
Presentation: TuBT30-NT.8, 13:30-15:00
Poster: 30.08 16:30-18:00
— Ted Xiao @ ICRA 2024 (@xiao_ted) May 13, 2024

Show more ...

December 2023,

Preprint: Survey on General-Purpose Robots via Foundation Models

We shared the first preprint of our survey on Foundational Models in Robotics.

🦾🤖📚we’ve been exploring the landscape of foundational models in robotics—unveiling insights on current trends and open challenges. A must-read for those interested in the path towards general-purpose robotics. #Robotics #FoundationModels #SurveyPaper https://t.co/VziYf3VScn
— Vidhi Jain (@viddivj) December 16, 2023

December 2023,

HomeRobot Challenge Completed

Here are some takeaways from HomeRobot Challenge at NeurIPS 2023.

The winning team from the OVMM competition has a writeup now: https://t.co/Kk9kizAvb4 pic.twitter.com/tZDyHEaDmv
— Chris Paxton (@chris_j_paxton) December 20, 2023

One (sad?) takeaway for me: when we see planning-based and learning methods compare on even footing, in terms of time invested, we basically never see learning-based methods working better.

HomeRobot is 100% a test of generalization, as object *classes* + envs are totally unseen https://t.co/3nBBudNgns
— Chris Paxton (@chris_j_paxton) December 15, 2023

November 2023,

2 papers and 2 workshop works presented at CoRL'23

I did not attend CoRL this year but check out some of our recent work presented by colleagues at the main conference:

1. HomeRobot: Open-Vocabulary Mobile Manipulation

The future of robot butlers starts with mobile manipulation.
We’re announcing the NeurIPS 2023 Open-Vocabulary Mobile Manipulation Challenge!
- Full robot stack ✅
- Parallel sim and real evaluation ✅
- No robot required ✅👀https://t.co/mggAbRhrLP pic.twitter.com/Wartsmkyyl
— Chris Paxton (@chris_j_paxton) June 21, 2023

2. Spatial-Language Attention Policies for Efficient Robot Learning (SLAP)

Excited that our work on SLAP will be appearing at CoRL 2023 @corl_conf
See you there and looking forward to chatting about it!
Work with @chris_j_paxton @XiaohanZhang220 @jdvakil @ybisk @viddivj Sam Powers. https://t.co/CU30b3whgS
— Priyam Parashar (@Priyam8Parashar) September 11, 2023

Also, check out some of the work at LangRob and Robot Learning Workshops.

3. PromptBook leverages LLMs for generating robot code! More than examples that were used in Code-as-Policies, we explore Instructions, Chain of Thought Prompting and State Estimation. Led by Montserrat Gonzalez and Andy Zeng at Google DeepMind Robotics. Here is the paper on OpenReview.

4. Open X-Embodiment is a huge robotics data collection effort to enable training of Robotic Foundational Models across multi-embodiments, different tasks, and different lab setups.

RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. 🔥🥳🧵

Project website: https://t.co/GAlvFdqwx5 pic.twitter.com/Jzy8b2eOjf
— Quan Vuong (@QuanVng) October 3, 2023

September 2023,

Spatial Language Attention Policies accepted at CoRL'23

How to use few examples to learn manipulation skills? SLAP is a new approach that learns to attend to spatial language to learn manipulation skills.

Excited to share our work on using few examples to learn manipulation skills. https://t.co/zPhUMLik1a
— Vidhi Jain (@viddivj) July 19, 2023

June 2023,

Started as Student Researcher at Google DeepMind

I am excited to start as a student researcher at Google DeepMind, Mountain View. I will be working with Debidatta Dwibedi on end-to-end video conditioned policy learning for robotics.

Excited to start as a student researcher at Google Deepmind Robotics :) pic.twitter.com/XW2lRqf0qH
— Vidhi Jain (@viddivj) June 6, 2023

June 2023,

HomeRobot Challenge at NeurIPS'23

Check out the HomeRobot, a large-scale sim-to-real mobile manipulation challenge at @NeurIPSConf 2023! More details about the challenge here. You can submit to EvalAI here. Our paper (accepted at CoRL 2023) shows RL and heuristic policies for sim to real transfer and identifies the challenges in the domain.

Check out Home Robot Challenge @NeurIPSConf 2023! Let’s build robot policies for rearranging homes :) https://t.co/TA6WwceyQc
— Vidhi Jain (@viddivj) June 21, 2023

September 2022,

Transformers Task Planners accepted at CoRL'22

Our work on learning preferences for canonical dish loading task using Transformers Task Planner is accepted at CoRL 2022. Read more here.

(1/5) Every home is different, and every person likes things done in their particular way. Therefore, home robots of the future need to both reason about the sequential nature of day-to-day tasks and generalize to user's preferences.
— Vidhi Jain (@viddivj) December 14, 2022

June 2022,

Blogpost on AI residents at Meta

I am working with Akshara Rai and Yixin Lin on preference-based task planning for dishwasher loading task. Read about the work of 2021-2022 AI residents at Meta here.

March 2020,

Heidelberg Laureate Forum (HLF)

Selected among 224 young researchers to meet laureates in the mathematics and computer science (postponed to Sep 2021); Participated in Virtual HLF 2020.

August 2019,

Research Fellow at Microsoft Research

I explored deep learning theory for generative models at Microsoft Research India, advised by Amit Deshpande and Navin Goyal.

June 2019,

J. N. Tata Endowment Scholarship for Higher Studies, 2019

Awarded interest-free loan and travel scholarship for higher studies.

May 2019,

K. C. Mahindra Scholarships for Post-Graduate Studies Abroad, 2019

62 students were awarded with interest-free loan for higher studies.

July 2018,

Bachelor Thesis at Mila, University de Montreal

I worked on my bachelor thesis Investigating the viability of Generative Models for Novelty Detection, with Aaron Courville.

July 2017,

Mitacs Globalink Research Internship

I was a Mitacs Globalink Research Intern at Simon Fraser University, Burnaby, Canada. I worked with Prof. Oliver Schulte on bayesian optimization algorithms for machine learning. Find our code here.

March 2017,

Citi Women Leader Award (CWLA) Scholarship

Awarded one year of study scholarship (Top 3 among 1200 candidates selected nationwide).
cwla group pic

Research

	ANAVI: Audio Noise Awareness using Visuals of Indoors for NAVIgation Vidhi Jain, Rishi Veerapaneni, Yonatan Bisk. 8th Annual Conference on Robot Learning (CoRL) 2024. webpage \| arXiv \| video \| code \| reviews \| poster \| Show BibTeX `@INPROCEEDINGS{Jain-CORL-24, AUTHOR = {Vidhi Jain and Rishi Veerapaneni and Yonatan Bisk}, TITLE = , BOOKTITLE = {8th Annual Conference on Robot Learning}, YEAR = {2024}, ADDRESS = {Munich, Germany}, MONTH = {November}, url = {https://openreview.net/forum?id=IsZb0wT3Kw} }` Copied!
	Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Vidhi Jain, Maria Attarian, Nikhil J Joshi Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi. 20th Edition of Robotics Science and Systems (RSS) Conference 2024. webpage \| arXiv \| video \| Show BibTeX `@INPROCEEDINGS{Jain-RSS-24, AUTHOR = {Vidhi Jain AND Maria Attarian AND Nikhil J Joshi AND Ayzaan Wahid AND Danny Driess AND Quan Vuong AND Pannag R Sanketi AND Pierre Sermanet AND Stefan Welker AND Christine Chan AND Igor Gilitschenski AND Yonatan Bisk AND Debidatta Dwibedi}, TITLE = , BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2024}, ADDRESS = {Delft, Netherlands}, MONTH = {July}, DOI = {10.15607/RSS.2024.XX.052} }` Copied!
	Towards General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk. Preprint* 2024. webpage \| arXiv \| code \| Show BibTeX @article{hu2023Toward, author = {Yafei Hu and Quanting Xie and Vidhi Jain and Jonathan Francis and Jay Patrikar and Nikhil Keetha and Seungchan Kim and Yaqi Xie and Tianyi Zhang and Shibo Zhao and Yu-Quan Chong and Chen Wang and Katia Sycara and Matthew Johnson-Roberson and Dhruv Batra and Xiaolong Wang and Sebastian Scherer and Zsolt Kira and Fei Xia and Yonatan Bisk}, title = {Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis}, booktitle = {arXiv preprint: arXiv:2312.08782 }, year = {2023}, } Copied!
	FlexCap: Generating Rich, Localized, and Flexible Captions in Images Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar. 38th Annual Conference on Neural Information Processing Systems (NeurIPS) 2024. webpage \| arXiv \| Show BibTeX `@inproceedings{dwibedi2024flexcap, title={FlexCap: Generating Rich, Localized, and Flexible Captions in Images}, author={Debidatta Dwibedi and Vidhi Jain and Jonathan Tompson and Andrew Zisserman and Yusuf Aytar}, year={2024}, booktitle={Conference of Neural Information Processing Systems (NeurIPS)}, url={https://openreview.net/forum?id=P5dEZeECGu}, MONTH = {December}, }` Copied!
	How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies Montserrat Gonzalez Arenas, Ted Xiao, Sumeet Singh, Vidhi Jain, Allen Z. Ren, Quan Vuong, Jacob Varley, Alexander Herzog, Isabel Leal, Sean Kirmani, Mario Prats, Dorsa Sadigh, Vikas Sindhwani, Kanishka Rao, Jacky Liang, Andy Zeng. 40th IEEE International Conference on Robotics and Automation (ICRA) 2023. arXiv \| Show BibTeX @inproceedings{ arenas2023how, title={How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies}, author={Montserrat Gonzalez Arenas and Ted Xiao and Sumeet Singh and Vidhi Jain and Allen Z. Ren and Quan Vuong and Jake Varley and Alexander Herzog and Isabel Leal and Sean Kirmani and Dorsa Sadigh and Vikas Sindhwani and Kanishka Rao and Jacky Liang and Andy Zeng}, booktitle={2nd Workshop on Language and Robot Learning: Language as Grounding}, year={2023}, url={https://openreview.net/forum?id=T8AiZj1QdN} } Copied!
	Open X-Embodiment: Robotic Learning Datasets and RT-X Models Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Chuer Pan, Chuyuan Fu, Coline Devin, Danny Driess, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Federico Ceola, Fei Xia, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Giulio Schiavi, Hao Su, Hao-Shu Fang, Haochen Shi, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jaehyung Kim, Jan Schneider, Jasmine Hsu, Jeannette Bohg, Jeffrey Bingham, Jiajun Wu, Jialin Wu, Jianlan Luo, Jiayuan Gu, Jie Tan, Jihoon Oh, Jitendra Malik, Jonathan Tompson, Jonathan Yang, Joseph J. Lim, João Silvério, Junhyek Han, Kanishka Rao, Karl Pertsch, Karol Hausman, Keegan Go, Keerthana Gopalakrishnan, Ken Goldberg, Kendra Byrne, Kenneth Oslund, Kento Kawaharazuka, Kevin Zhang, Keyvan Majd, Krishan Rana, Krishnan Srinivasan, Lawrence Yunliang Chen, Lerrel Pinto, Liam Tan, Lionel Ott, Lisa Lee, Masayoshi Tomizuka, Maximilian Du, Michael Ahn, Mingtong Zhang, Mingyu Ding, Mohan Kumar Srirama, Mohit Sharma, Moo Jin Kim, Naoaki Kanazawa, Nicklas Hansen, Nicolas Heess, Nikhil J Joshi, Niko Suenderhauf, Norman Di Palo, Nur Muhammad Mahi Shafiullah, Oier Mees, Oliver Kroemer, Pannag R Sanketi, Paul Wohlhart, Peng Xu, Pierre Sermanet, Priya Sundaresan, Quan Vuong, Rafael Rafailov, Ran Tian, Ria Doshi, Roberto Martín-Martín, Russell Mendonca, Rutav Shah, Ryan Hoque, Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, Sherry Moore, Shikhar Bahl, Shivin Dass, Shuran Song, Sichun Xu, Siddhant Haldar, Simeon Adebola, Simon Guist, Soroush Nasiriany, Stefan Schaal, Stefan Welker, Stephen Tian, Sudeep Dasari, Suneel Belkhale, Takayuki Osa, Tatsuya Harada, Tatsuya Matsushima, Ted Xiao, Tianhe Yu, Tianli Ding, Todor Davchev, Tony Z. Zhao, Travis Armstrong, Trevor Darrell, Vidhi Jain, Vincent Vanhoucke, Wei Zhan, Wenxuan Zhou, Wolfram Burgard, Xi Chen, Xiaolong Wang, Xinghao Zhu, Xuanlin Li, Yao Lu, Yevgen Chebotar, Yifan Zhou, Yifeng Zhu, Ying Xu, Yixuan Wang, Yonatan Bisk, Yoonyoung Cho, Youngwoon Lee, Yuchen Cui, Yueh-hua Wu, Yujin Tang, Yuke Zhu, Yunzhu Li, Yusuke Iwasawa, Yutaka Matsuo, Zhuo Xu, Zichen Jeff Cui. 40th IEEE International Conference on Robotics and Automation (ICRA) 2023. webpage \| arXiv \| code \| Show BibTeX @inproceedings{ArXiv:Collaboration2023, author = {Open X-Embodiment Collaboration and Abhishek Padalkar and Acorn Pooley and Ajinkya Jain and Alex Bewley and Alex Herzog and Alex Irpan and Alexander Khazatsky and Anant Rai and Anikait Singh and Anthony Brohan and Antonin Raffin and Ayzaan Wahid and Ben Burgess-Limerick and Beomjoon Kim and Bernhard Schölkopf and Brian Ichter and Cewu Lu and Charles Xu and Chelsea Finn and Chenfeng Xu and Cheng Chi and Chenguang Huang and Christine Chan and Chuer Pan and Chuyuan Fu and Coline Devin and Danny Driess and Deepak Pathak and Dhruv Shah and Dieter Büchler and Dmitry Kalashnikov and Dorsa Sadigh and Edward Johns and Federico Ceola and Fei Xia and Freek Stulp and Gaoyue Zhou and Gaurav S. Sukhatme and Gautam Salhotra and Ge Yan and Giulio Schiavi and Hao Su and Hao-Shu Fang and Haochen Shi and Heni Ben Amor and Henrik I Christensen and Hiroki Furuta and Homer Walke and Hongjie Fang and Igor Mordatch and Ilija Radosavovic and Isabel Leal and Jacky Liang and Jaehyung Kim and Jan Schneider and Jasmine Hsu and Jeannette Bohg and Jeffrey Bingham and Jiajun Wu and Jialin Wu and Jianlan Luo and Jiayuan Gu and Jie Tan and Jihoon Oh and Jitendra Malik and Jonathan Tompson and Jonathan Yang and Joseph J. Lim and João Silvério and Junhyek Han and Kanishka Rao and Karl Pertsch and Karol Hausman and Keegan Go and Keerthana Gopalakrishnan and Ken Goldberg and Kendra Byrne and Kenneth Oslund and Kento Kawaharazuka and Kevin Zhang and Keyvan Majd and Krishan Rana and Krishnan Srinivasan and Lawrence Yunliang Chen and Lerrel Pinto and Liam Tan and Lionel Ott and Lisa Lee and Masayoshi Tomizuka and Maximilian Du and Michael Ahn and Mingtong Zhang and Mingyu Ding and Mohan Kumar Srirama and Mohit Sharma and Moo Jin Kim and Naoaki Kanazawa and Nicklas Hansen and Nicolas Heess and Nikhil J Joshi and Niko Suenderhauf and Norman Di Palo and Nur Muhammad Mahi Shafiullah and Oier Mees and Oliver Kroemer and Pannag R Sanketi and Paul Wohlhart and Peng Xu and Pierre Sermanet and Priya Sundaresan and Quan Vuong and Rafael Rafailov and Ran Tian and Ria Doshi and Roberto Martín-Martín and Russell Mendonca and Rutav Shah and Ryan Hoque and Ryan Julian and Samuel Bustamante and Sean Kirmani and Sergey Levine and Sherry Moore and Shikhar Bahl and Shivin Dass and Shuran Song and Sichun Xu and Siddhant Haldar and Simeon Adebola and Simon Guist and Soroush Nasiriany and Stefan Schaal and Stefan Welker and Stephen Tian and Sudeep Dasari and Suneel Belkhale and Takayuki Osa and Tatsuya Harada and Tatsuya Matsushima and Ted Xiao and Tianhe Yu and Tianli Ding and Todor Davchev and Tony Z. Zhao and Travis Armstrong and Trevor Darrell and Vidhi Jain and Vincent Vanhoucke and Wei Zhan and Wenxuan Zhou and Wolfram Burgard and Xi Chen and Xiaolong Wang and Xinghao Zhu and Xuanlin Li and Yao Lu and Yevgen Chebotar and Yifan Zhou and Yifeng Zhu and Ying Xu and Yixuan Wang and Yonatan Bisk and Yoonyoung Cho and Youngwoon Lee and Yuchen Cui and Yueh-hua Wu and Yujin Tang and Yuke Zhu and Yunzhu Li and Yusuke Iwasawa and Yutaka Matsuo and Zhuo Xu and Zichen Jeff Cui}, title = , booktitle = {International Conference on Robotics and Automation (ICRA)}, year = {2023}, url = {https://robotics-transformer-x.github.io}, } Copied!
	Spatial Language Attention Policies for Efficient Robot Learning Priyam Parasher, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers, Yonatan Bisk and Chris Paxton. 7th Annual Conference on Robot Learning (CoRL) 2023. webpage \| arXiv \| code \| reviews \| Show BibTeX `@inproceedings{ parashar2023slap, title={SLAP: Spatial-Language Attention Policies}, author={Priyam Parashar and Vidhi Jain and Xiaohan Zhang and Jay Vakil and Sam Powers and Yonatan Bisk and Chris Paxton}, booktitle={7th Annual Conference on Robot Learning}, year={2023}, url={https://openreview.net/forum?id=7Pkzm2FgUmq} }` Copied!
	HomeRobot: Open-Vocabulary Mobile Manipulation Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin S Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander Clegg, John M Turner, Zsolt Kira, Manolis Savva, Angel X Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton. 7th Annual Conference on Robot Learning (CoRL) 2023. webpage \| arXiv \| code \| reviews \| Show BibTeX @inproceedings{ yenamandra2023homerobot, title={HomeRobot: Open-Vocabulary Mobile Manipulation}, author={Sriram Yenamandra and Arun Ramachandran and Karmesh Yadav and Austin S Wang and Mukul Khanna and Theophile Gervet and Tsung-Yen Yang and Vidhi Jain and Alexander Clegg and John M Turner and Zsolt Kira and Manolis Savva and Angel X Chang and Devendra Singh Chaplot and Dhruv Batra and Roozbeh Mottaghi and Yonatan Bisk and Chris Paxton}, booktitle={7th Annual Conference on Robot Learning}, year={2023}, url={https://openreview.net/forum?id=b-cto-fetlz} } Copied!
	Transformers are Adaptable Task Planners Vidhi Jain, Yixin Lin, Eric Undersander, Yonatan Bisk and Akshara Rai. 6th Annual Conference on Robot Learning (CoRL) 2022. webpage \| arXiv \| video \| code \| reviews \| Show BibTeX `@inproceedings{ jain2022transformers, title={Transformers Are Adaptable Task Planners}, author={Vidhi Jain and Yixin Lin and Eric Undersander and Yonatan Bisk and Akshara Rai}, booktitle={6th Annual Conference on Robot Learning}, year={2022}, url={https://openreview.net/forum?id=Eal_lL08v_l} }` Copied!
	MAEA: Multimodal Attribution in Embodied AI Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, and Yonatan Bisk. NeurIPS Workshop on Trustworthy Embodied AI 2022. webpage \| arXiv \| video \| reviews \| Show BibTeX `@inproceedings{ jain2022maea, title={MAEA: Multimodal Attribution for Embodied AI }, author={Vidhi Jain and Sahiti Yerramilli and Jayant Sravan Tamarapalli and Yonatan Bisk }, booktitle={NeurIPS 2022 workshop TEA}, year={2022}, url={https://openreview.net/forum?id=OUs2us5Xhx} }` Copied!
	Towards Explainable Embodied AI Vidhi Jain Masters thesis 2021. pdf \| Show BibTeX `@mastersthesis{Jain-2021-129116, author = {Vidhi Jain}, title = {Towards Explainable Embodied AI}, year = {2021}, month = {July}, school = {Carnegie Mellon University}, address = {Pittsburgh, PA}, number = {CMU-RI-TR-21-31}, keywords = {explainability, Deep RL, navigation}, }` Copied!
	Learning to capture spatial semantic priors for indoor navigation Vidhi Jain, Shishir Patil, Prakhar Agarwal and Katia Sycara. NeurIPS Object Representations for Learning and Reasoning (ORLR) 2020. pdf \| webpage \| arXiv \| video \| code \| Show BibTeX `@article{Jain2021LearningET, title={Learning Embeddings that Capture Spatial Semantics for Indoor Navigation}, author={Vidhi Jain and Prakhar Agarwal and Shishir G. Patil and Katia P. Sycara}, journal={ArXiv}, year={2021}, volume={abs/2108.00159} }` Copied!
	Predicting strategies in simulated search and rescue tasks Vidhi Jain, Rohit Jena, Huao Li, Tejus Gupta, Dana Hughes, Michael Lewis and Katia Sycara. NeurIPS AI for Humanitarian Assistance and Disaster Response (AIADR) 2020. arXiv \| video \| slides \| Show BibTeX Copied!
	Learning to navigate in unseen cluttered environments Vidhi Jain, Ganesh Iyer and Katia Sycara. NeurIPS Women in Machine Learning workshop (WiML) 2020. pdf \| poster \| Show BibTeX Copied!
	Coping with sample inefficiency in deep reinforcement learning Vidhi Jain, Simin Liu, and Ganesh Iyer. ICML Women in Machine Learning Un-Workshop (WiML) 2020. pdf \| slides \| Show BibTeX Copied!
	Investigating the viability of Generative Models for Novelty Detection Vidhi Jain Bachelors thesis 2018. pdf \| Show BibTeX `@mastersthesis{Jain-2021-129116, author = {Vidhi Jain}, title = {Towards Explainable Embodied AI}, year = {2021}, month = {July}, school = {Carnegie Mellon University}, address = {Pittsburgh, PA}, number = {CMU-RI-TR-21-31}, keywords = {explainability, Deep RL, navigation}, }` Copied!
	Symptomatic Diagnosis and Prognosis of Psychiatric Disorders through Personal Gadgets Vidhi Jain, Prakhar Agarwal. ACM CHI Extended Abstracts (CHI EA'17) 2017. pdf \| webpage \| slides \| poster \| Show BibTeX @inproceedings{10.1145/3027063.3048417, author = {Jain, Vidhi and Agarwal, Prakhar}, title = {Symptomatic Diagnosis and Prognosis of Psychiatric Disorders through Personal Gadgets}, year = {2017}, isbn = {9781450346566}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3027063.3048417}, doi = {10.1145/3027063.3048417}, abstract = {Mental disorder has been shrouded as a stigma and disregarded as a secondary issue to physical health. It has become a major contributor to morbidity, disability and at times, fatality. Through our research, we show that the data generated through our daily interaction with technology has consistent patterns to identify symptoms in prodromal phase of degrading mental health. We propose a methodological data driven system that will help to raise an early alarm on the onset of symptoms of potential psychiatric disorders. The system collects the user's data from different human-computer interfaces to create a fine-grain electronic health portfolio, which can assist doctors in differential diagnosis as well as prognosis.}, booktitle = {Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems}, pages = {118–123}, numpages = {6}, keywords = {mental health symptoms, design technique, data collection and processing}, location = {Denver, Colorado, USA}, series = {CHI EA '17} } Copied!
	Model Selection Scores for Multi-Relational Bayesian Networks Sajjad Gholami, Oliver Schulte, Vidhi Jain, Qiang Zhao. IJCAI Declarative Learning Based Programming (DeLBP) 2017. pdf \| code \| Show BibTeX `@inproceedings{gholami2017model, author = {Sajjad Gholami and Oliver Schulte and Vidhi Jian and Qiang Zhao}, title = {Model Selection Scores for Multi-Relational Bayesian Networks}, booktitle = {Extended Abstract for DeLBP Workshop at IJCAI 2017}, year = {2017}, }` Copied!
	Empowering API Consumer Community: Collaborative Annotation of Web API Documentation for Semantically Structured Format Vidhi Jain and Matthias Frank Grace Hopper Conference India (GHCI) 2016 2016. pdf \| poster \|

Talks

July 2024,

December 2022,

Education

	Master of Science in Robotics (MSR) 2019-2021 Advisor: Katia Sycara (social machine intelligence) Thesis: Towards Explainable Embodied AI
	Bachelors of Engineering (Honors) in Computer Science 2014-2018 Advisor: Aaron Courville, Mila (off-campus thesis) Thesis: Investigating viability of generative models for out-of-distribution detection.

Design and source code from Leonid Keselman's Jekyll fork and Jon Barron's website

Vidhi Jain

News

FlexCap accepted at NeurIPS'24.

ANAVI accepted at CoRL'24.

Visiting Stanford

Vid2Robot accepted at RSS'24.

Two papers accepted at ICRA'24.

Preprint: Survey on General-Purpose Robots via Foundation Models

HomeRobot Challenge Completed

2 papers and 2 workshop works presented at CoRL'23

Spatial Language Attention Policies accepted at CoRL'23

Started as Student Researcher at Google DeepMind

HomeRobot Challenge at NeurIPS'23

Transformers Task Planners accepted at CoRL'22

Blogpost on AI residents at Meta

Heidelberg Laureate Forum (HLF)

Research Fellow at Microsoft Research

J. N. Tata Endowment Scholarship for Higher Studies, 2019

K. C. Mahindra Scholarships for Post-Graduate Studies Abroad, 2019

Bachelor Thesis at Mila, University de Montreal

Mitacs Globalink Research Internship

Citi Women Leader Award (CWLA) Scholarship

Research

ANAVI: Audio Noise Awareness using Visuals of Indoors for NAVIgation

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Towards General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Spatial Language Attention Policies for Efficient Robot Learning

HomeRobot: Open-Vocabulary Mobile Manipulation

Transformers are Adaptable Task Planners

MAEA: Multimodal Attribution in Embodied AI

Towards Explainable Embodied AI

Learning to capture spatial semantic priors for indoor navigation

Predicting strategies in simulated search and rescue tasks

Learning to navigate in unseen cluttered environments

Coping with sample inefficiency in deep reinforcement learning

Investigating the viability of Generative Models for Novelty Detection

Symptomatic Diagnosis and Prognosis of Psychiatric Disorders through Personal Gadgets

Model Selection Scores for Multi-Relational Bayesian Networks

Empowering API Consumer Community: Collaborative Annotation of Web API Documentation for Semantically Structured Format

Talks

Education

Master of Science in Robotics (MSR)

Bachelors of Engineering (Honors) in Computer Science