Publications | Toyota Research Institute

Featured Publications

All Publications

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Robotics | June 29, 2023

Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD, nuScenes) and indoor (NYUv2) benchmarks, and achieved a new state-of-the-art in both settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates. READ MORE

The materials experiment knowledge graph

Energy & Materials | June 28, 2023

Materials knowledge is inherently hierarchical. While high-level descriptors such as composition and structure are valuable for contextualizing materials data, the data must ultimately be considered in the context of its low-level acquisition details. Graph databases offer an opportunity to represent hierarchical relationships among data, organizing semantic relationships into a knowledge graph. Herein, we establish a knowledge graph of materials experiments whose construction encodes the complete provenance of each material sample and its associated experimental data and metadata. Additional relationships among materials and experiments further encode knowledge and facilitate data exploration. We illustrate the Materials Experiment Knowledge Graph (MekG) using several use cases, demonstrating the value of modern graph databases for the enterprise of data-driven materials science. READ MORE

Early Prediction of Ion Transport Properties in Solid Polymer Electrolytes Using Machine Learning and System Behavior-Based Descriptors of Molecular Dynamics Simulations

Energy & Materials | June 23, 2023

Molecular dynamics simulations are useful tools to screen solid polymer electrolytes with suitable properties applicable to Li-ion batteries. However, due to the vast design space of polymers, it is highly desirable to accelerate the screening by reducing the computational time of ion transport properties from simulations. In this study, we show that with a judicious choice of descriptors we can predict the equilibrium ion transport properties in LiTFSI–homopolymer systems within the first 0.5 ns of the production run of simulations. Specifically, we find that descriptors that include information about the behavior of the system, such as ion clustering and time evolution of ion transport properties, have several advantages over polymer structure-based descriptors, as they encode system (polymer and salt) behavior rather than just the class of polymers and can be computed at any time point during the simulations. These characteristics increase the applicability of our descriptors to a wide range of polymer systems (e.g., copolymers, blend of polymers, salt concentrations, and temperatures) and can be impactful in significantly shortening the discovery pipeline for solid polymer electrolytes. READ MORE

More human than human: LLM-generated narratives outperform human-LLM interleaved narratives

Human-Centered AI | June 19, 2023

Narrative story generation has gained emerging interest in the field of large language models. The present paper aims to compare stories generated by an LLM only (non-interleaved) with those generated by interleaving human-generated and LLM-generated text (interleaved). The study’s hypothesis is that interleaved stories would perform better than non-interleaved stories. To verify this hypothesis, we conducted two tests with roughly 500 participants each. Participants were asked to rate stories of each type, including an overall score or preference and four facets—logical soundness, plausibility, understandability, and novelty. Our findings indicate that interleaved stories were in fact less preferred than non-interleaved stories. The result has implications for the design and implementation of our story generators. This study contributes new insights into the potential uses and restrictions of interleaved and non-interleaved systems regarding generating narrative stories, which may help to improve the performance of such story generators. READ MORE

Drag-guided diffusion models for vehicle image generation

Human Interactive Driving | June 16, 2023

Denoising diffusion models trained at web-scale have revolutionized image generation. The application of these tools to engineering design is an intriguing possibility, but is currently limited by their inability to parse and enforce concrete engineering constraints. In this paper, we take a step towards this goal by proposing physics-based guidance, which enables optimization of a performance metric (as predicted by a surrogate model) during the generation process. As a proof-of-concept, we add drag guidance to Stable Diffusion, which allows this tool to generate images of novel vehicles while simultaneously minimizing their predicted drag coefficients. READ MORE

Interpreting and Improving Diffusion Models from an Optimization Perspective

Human Interactive Driving | June 8, 2023

Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection error of the denoiser. Finally, we propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models. READ MORE

Interpreting and Improving Diffusion Models Using the Euclidean Distance Function

Human Interactive Driving | June 8, 2023

Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to reinterpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection-error of the denoiser. Finally, we propose a new sampler based on two simple modifications to DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models. READ MORE

Materials cartography: A forward-looking perspective on materials representation and devising better maps

Energy & Materials | June 6, 2023

Machine learning (ML) is gaining popularity as a tool for materials scientists to accelerate computation, automate data analysis, and predict materials properties. The representation of input material features is critical to the accuracy, interpretability, and generalizability of data-driven models for scientific research. In this Perspective, we discuss a few central challenges faced by ML practitioners in developing meaningful representations, including handling the complexity of real-world industry-relevant materials, combining theory and experimental data sources, and describing scientific phenomena across timescales and length scales. We present several promising directions for future research: devising representations of varied experimental conditions and observations, the need to find ways to integrate machine learning into laboratory practices, and making multi-scale informatics toolkits to bridge the gaps between atoms, materials, and devices. READ MORE

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Robotics | June 1, 2023

This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details will be publicly available. READ MORE

Surrogate Modeling of Car Drag Coefficient with Depth and Normal Renderings

Human Interactive Driving | May 26, 2023

Generative AI models have made significant progress in automating the creation of 3D shapes, which has the potential to transform car design. In engineering design and optimization, evaluating engineering metrics is crucial. To make generative models performance-aware and enable them to create high-performing designs, surrogate modeling of these metrics is necessary. However, the currently used representations of three-dimensional (3D) shapes either require extensive computational resources to learn or suffer from significant information loss, which impairs their effectiveness in surrogate modeling. To address this issue, we propose a new two-dimensional (2D) representation of 3D shapes. We develop a surrogate drag model based on this representation to verify its effectiveness in predicting 3D car drag. We construct a diverse dataset of 9,070 high-quality 3D car meshes labeled by drag coefficients computed from computational fluid dynamics (CFD) simulations to train our model. Our experiments demonstrate that our model can accurately and efficiently evaluate drag coefficients with an R2 value above 0.84 for various car categories. Moreover, the proposed representation method can be generalized to many other product categories beyond cars. Our model is implemented using deep neural networks, making it compatible with recent AI image generation tools (such as Stable Diffusion) and a significant step towards the automatic generation of drag-optimized car designs. We have made the dataset and code publicly available at this https URL. READ MORE