Skip to main content

All Publications

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Human Interactive Driving | October 26, 2023

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video and to view the code and demos on our project webpage. READ MORE

Image
drive anywhere
What is missing in autonomous discovery: open challenges for the community
Energy & Materials | October 16, 2023

Self-driving labs (SDLs) leverage combinations of artificial intelligence, automation, and advanced computing to accelerate scientific discovery. The promise of this field has given rise to a rich community of passionate scientists, engineers, and social scientists, as evidenced by the development of the Acceleration Consortium and recent Accelerate Conference. Despite its strengths, this rapidly developing field presents numerous opportunities for growth, challenges to overcome, and potential risks of which to remain aware. This community perspective builds on a discourse instantiated during the first Accelerate Conference, and looks to the future of self-driving labs with a tempered optimism. Incorporating input from academia, government, and industry, we briefly describe the current status of self-driving labs, then turn our attention to barriers, opportunities, and a vision for what is possible. Our field is delivering solutions in technology and infrastructure, artificial intelligence and knowledge generation, and education and workforce development. In the spirit of community, we intend for this work to foster discussion and drive best practices as our field grows. READ MORE

Image
A triptych of stable-diffusion generated images describing a self-driving lab for autonomous scientific discovery
The Effect of Ionomer to Carbon Ratio and Relative Humidity on Cathode Catalyst Degradation in PEM Fuel Cells
Energy & Materials | October 16, 2023

The effect of ionomer to carbon (I/C) weight ratio and relative humidity (RH) on cathode catalyst degradation was investigated by comprehensive in situ characterization. Membrane electrode assemblies (MEA) with I/C ratios of 0.5, 0.8 and 1.2 were subjected to an accelerated stress test performed at 40, 70 and 100% RH. The results show an increasing loss in electrochemical active surface area (ECSA) for both higher I/C ratios and RH during voltage cycling. To differentiate between ionomer and water connected ECSA, carbon monoxide stripping measurements were performed at varying RH. Before degradation, all MEAs show comparable total ECSA values, while higher I/C ratios lead to a larger fraction of ionomer connected ECSA. After degradation, ECSA measurements of the lowest I/C ratio showed a relatively higher loss of Pt in contact with ionomer than Pt in contact with water, while an opposite trend was observed for higher I/C ratios. H2/N2 impedance measurements showed drastically increasing protonic catalyst layer resistances for decreasing RH especially at low I/C ratios, which might hinder Pt2+ ion diffusion towards the membrane, hence decreasing the ECSA loss. Limiting current measurements show increasing molecular O2 diffusion resistances at end of test for samples with higher I/C ratios and higher ECSA loss. READ MORE

Image
Schematic of the expected impact of I/C ratio and RH on an electrode surface at begin of test.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Robotics | October 13, 2023

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website. READ MORE

Image
Open-X Embodimen
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Robotics | October 10, 2023

Object tracking is central to robot perception and scene understanding. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking? In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos. Our method predicts object and part tracks with associated language descriptions in monocular videos, rebuilding the pipeline of Tractor with modern large pre-trained models for static image detection and segmentation: we detect open-vocabulary object instances and propagate their boxes from frame to frame using a flow-based motion model, refine the propagated boxes with the box regression module of the visual detector, and prompt an open-world segmenter with the refined box to segment the objects. We decide the termination of an object track based on the objectness score of the propagated boxes, as well as forward-backward optical flow consistency. We re-identify objects across occlusions using deep feature matching. We show that our model achieves strong performance on multiple established video object segmentation and tracking benchmarks, and can produce reasonable tracks in manipulation data. In particular, our model outperforms previous state-of-the-art in UVO and BURST, benchmarks for open-world object tracking and segmentation, despite never being explicitly trained for tracking. We hope that our approach can serve as a simple and extensible framework for future research. READ MORE

Image
Qualitative object tracking results
“If you weren't connected to the Internet, you were not alive”: experience of using social technology during COVID-19 in adults 50+
Robotics | October 8, 2023

Loneliness and social isolation reduce physical and mental wellbeing. Older adults are particularly prone to social isolation due to decreased connection with previous social networks such as at workplaces. Social technology can decrease loneliness and improve wellbeing. The COVID-19 pandemic prompted quarantine and social distancing for many people, creating a context of widespread social isolation. READ MORE

Image
inter-rate reliability table
Promoting Sustainable Charging Through User Interface Interventions
Human-Centered AI | September 18, 2023

With the rising popularity of electrified vehicles, emphasis has been placed on encouraging charging with renewable energy and maximizing battery longevity to improve vehicle sustainability. Many mobile applications offer tools to suggest charging times with more sustainable renewable energy and charging strategies that preserve battery health. However, these options often result in longer, less convenient charging times for drivers. Here we conducted three charging scenario studies to identify factors that influence willingness to wait for sustainable charging. Participants selected between faster but less sustainable charging options and slower charging options that either reduce charging emissions or improve battery longevity. We find people’s willingness to wait for green energy is influenced by situational factors; further we find that information and battery longevity interventions can increase willingness to wait for sustainable charging. Finally, we provide design recommendations to promote sustainably in charging behaviors. READ MORE

Image
charging interventions tested
Learning heterogeneous reaction kinetics from X-ray videos pixel by pixel
Energy & Materials | September 13, 2023

Reaction rates at spatially heterogeneous, unstable interfaces are notoriously difficult to quantify, yet are essential in engineering many chemical systems, such as batteries1 and electrocatalysts2. Experimental characterizations of such materials by operando microscopy produce rich image datasets3,4,5,6, but data-driven methods to learn physics from these images are still lacking because of the complex coupling of reaction kinetics, surface chemistry and phase separation7. Here we show that heterogeneous reaction kinetics can be learned from in situ scanning transmission X-ray microscopy (STXM) images of carbon-coated lithium iron phosphate (LFP) nanoparticles. Combining a large dataset of STXM images with a thermodynamically consistent electrochemical phase-field model, partial differential equation (PDE)-constrained optimization and uncertainty quantification, we extract the free-energy landscape and reaction kinetics and verify their consistency with theoretical models. We also simultaneously learn the spatial heterogeneity of the reaction rate, which closely matches the carbon-coating thickness profiles obtained through Auger electron microscopy (AEM). Across 180,000 image pixels, the mean discrepancy with the learned model is remarkably small (<7%) and comparable with experimental noise. Our results open the possibility of learning nonequilibrium material properties beyond the reach of traditional experimental methods and offer a new non-destructive technique for characterizing and optimizing heterogeneous reactive surfaces. READ MORE

Image
x-ray image
Abstracting road traffic via topological braids: Applications to traffic flow analysis and distributed control
Human Interactive Driving | September 8, 2023

Despite the structure of road environments, imposed via geometry and rules, traffic flows exhibit complex multiagent dynamics. Reasoning about such dynamics is challenging due to the high dimensionality of possible behavior, the heterogeneity of agents, and the stochasticity of their decision-making. Modeling approaches learning associations in Euclidean spaces are often limited by their high sample complexity and the sparseness of available datasets. Our key insight is that the structure of traffic behavior could be effectively captured by lower-dimensional abstractions that emphasize critical interaction relationships. In this article, we abstract the space of behavior in traffic scenes into a discrete set of interaction modes, described in interpretable, symbolic form using topological braids. First, through a case study across real-world datasets, we show that braids can describe a wide range of complex behavior and uncover insights about the interactivity of vehicles. For instance, we find that high vehicle density does not always map to rich mixing patterns among them. Further, we show that our representation can effectively guide decision-making in traffic scenes. We describe a mechanism that probabilistically maps vehicles’ past behavior to modes of future interaction. We integrate this mechanism into a control algorithm that treats navigation as minimization of uncertainty over interaction modes, and investigate its performance on the task of traversing uncontrolled intersections in simulation. We show that our algorithm enables agents to coordinate significantly safer traversals for similar efficiency compared to baselines explicitly reasoning in the space of trajectories across a series of challenging scenarios. READ MORE

Image
topical braids article image
Affinity for Technology Relates to Group Cohesion for New, But Not Existing, Groups
Robotics | September 7, 2023

During the 2020 COVID-19 pandemic, governments around the world mandated shutdowns and social distancing, limiting how much people could see other people outside of their household. Because of this, people had negative mental health outcomes, and many people turned to technology to maintain connections and create new ones. In this paper, we examine the relationship between technology, mental health, and group cohesion with existing groups (N = 202) and new groups (N = 74). We surveyed U.S. participants in June 2020, two to three months after the start of mandated social distancing. Results indicated that, as predicted, higher levels of reported group cohesion typically related to better reported mental health; however, the relationship occurred differently for existing groups compared to new groups. Further, higher levels of affinity for technology did not relate to group cohesion for existing groups, but did relate to more perceived cohesion for new groups. Researchers and mental health practitioners can use these results to help people develop a sense of group cohesion with new and existing groups and improve mental health during relative social isolation; technology may be especially beneficial for people to connect with new groups compared to existing groups. READ MORE

Image
Graphs of the effect of Group Type