Accepted Submissions (🎤 = oral)

Demon in the machine: learning to extract work and absorb entropy from fluctuating nanosystems

Stephen Whitelam (Lawrence Berkeley National Lab)*
Abstract: We use Monte Carlo and genetic algorithms to train neural-network feedback-control protocols for simulated fluctuating nanosystems. These protocols convert the information obtained by the feedback process into heat or work, allowing the extraction of work from a colloidal particle pulled by an optical trap and the absorption of entropy by an Ising model undergoing magnetization reversal. The learning framework requires no prior knowledge of the system, depends only upon measurements that are accessible experimentally, and scales to systems of considerable complexity. It could be used in the laboratory to learn protocols for fluctuating nanosystems that convert measurement information into stored work or heat.

Temporal Embeddings: Scalable Self-Supervised Temporal Representation Learning from Spatiotemporal Data for Multimodal Computer Vision

Yi Cao (Apple); Swetava Ganguli (Apple)*; Vipul Pandey (Apple)
Abstract: There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to stratify landscape based on mobility activity time series. First, the time series signal is transformed to the frequency domain and then compressed into task-agnostic temporal embeddings by a contractive autoencoder, which preserves cyclic temporal patterns observed in time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling of downstream geospatial tasks using deep semantic segmentation. Experiments show that temporal embeddings are semantically meaningful representations of time series data and are effective across different tasks such as classifying residential and commercial areas, classifying activity areas like golf courses, grocery shops, road intersections, educational buildings, etc., and for stratifying the landscape into various activity stratas such as downtown, urban, suburban, rural, etc. Temporal embeddings transform sequential, spatiotemporal motion trajectory data into semantically meaningful image-like tensor representations that can be combined (multimodal fusion) with other data modalities that are or can be transformed into image-like tensor representations (for e.g., RBG imagery, graph embeddings of road networks, passively collected imagery like SAR, etc.) to facilitate multimodal learning in geospatial computer vision.

🎤 DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

Sang Michael Xie (Stanford University)*; Hieu Pham (Google); Xuanyi Dong (University of Technology Sydney); Nan Du (Google Brain); Hanxiao Liu (Google Brain); Yifeng Lu (Google Brain); Percy Liang (Stanford University); Quoc Le (Google Brain); Tengyu Ma (Stanford); Adams Wei Yu (Google Brain)
Abstract: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to find domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.

Modeling Electronic Health Records for Predicting MRI Abdominal Protocols

Peyman Shokrollahi (Stanford University)*
Abstract: Abstract not made public per author's request

Saturn: Efficient Multi-Large-Model Deep Learning

Kabir Nagrecha (UC San Diego)*; Arun Kumar (University of California, San Diego)
Abstract: In this paper, we propose Saturn, a new data system to improve the efficiency of multi-large-model training (e.g. during model selection/hyperparameter optimization). We first identify three key interconnected systems challenges for users building large models in this setting — parallelism technique selection, distribution of GPUs over jobs, and scheduling. We then formalize these as a joint problem, and build a new system architecture to tackle these challenges simultaneously. Our evaluations show that our joint-optimization approach yields 39-49% lower model selection runtimes than typical current DL practice.

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Hong Liu (Stanford university )*; Zhiyuan Li (Stanford University); David Hall (Microsoft); Percy Liang (Stanford University); Tengyu Ma (Stanford)
Abstract: Given the massive cost of language model pre-training, a non-trivial improve- ment of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often in- cur too much per-step overhead. In this project, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT-2 models of sizes rang- ing from 125M to 770M, Sophia achieves a more than 2x speed-up compared with Adam in the number of steps, total compute, and wall-clock time.

Ranking Binary Functions without Training

Russ Webb (Apple)*
Abstract: The parity function of k bits in unknown, fixed locations within an input of n ran- dom bits has been studied as a difficult, representative approximation to learning from unknown causal factors in tabular data. The question of which binary func- tions are difficult to learn for ReLU multi-layer perceptrons (MLPs) is addressed in this work. An algorithm, achieving 88% accuracy, ranking pairs of balanced, binary functions on stochastic gradient decent (SGD) training difficulty is shown to be more accurate in ranking the difficulty than by directly measuring the train- ing times. This algorithm is 4000 times faster to compute and 2.5% more accurate than ranking based on a single observed training of each function.

🎤 DORSal: Diffusion for Object-centric Representations of Scenes et al.

Allan Jabri (UC Berkeley); Sjoerd van Steenkiste (Google)*; Emiel Hoogeboom (University of Amsterdam); Mehdi Sajjadi (Google Research); Thomas Kipf (Google Brain)
Abstract: We leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.

Probabilistic Modeling for Mixed-Variable Sequences

Denis A Gudovskiy (Panasonic)*; Tomoyuki Okuno (Panasonic); Yohei Nakata (Panasonic Holdings Corporation)
Abstract: Density estimation for sequential data is a core research and engineering task. Deep learning (DL) probabilistic models are a promising direction to model multivariate statistical dependencies in such domain, where both the data dimension and their temporal dynamics can be modeled using probabilistic framework. Examples of such models include generative normalizing flows and diffusion models. The key limitation of DL-based models in the real-world applications is the support of heterogenous mixed-variable features. Typically, they perform excellent when applied to dense numerical (continuous) features, but underperform compared to e.g. an ensembles of gradient boosting decision trees (GBDT) when applied to sparse categorical features (cardinal or ordinal discrete numbers). In this paper, we propose a mixed-variable probabilistic modeling framework to support all kinds of input sequences as sketched in Fig.1. We generalize previous methods and introduce an encoding step to implement the variable-agnostic approach. Our proof-of-concept evaluations on the real-world ATM failure prediction dataset show advantages of an additional mixed-variable encoding, where our method outperforms commonly-used GBDT and discriminative classifiers.

Fair Augmentation of Decision Trees Through Selective Node Retraining

Coen T Adler (University of California, Santa Cruz)*
Abstract: With modern machine learning models becoming ever more complex and embedded within society, there is a need for accurate, interpretable, and fair models that users can trust. Decision trees being a fully interpretable model, fills this role perfectly. Current research shows that algorithms exist that can train decision trees to be both accurate and fair. Despite this, decision trees are often trained solely on accuracy, resulting in biased or discriminative pre-trained trees. Frequently, the root of the bias in these pre-trained trees stem from a few select nodes or subtrees. In this paper, I propose a novel method of selective fair retraining of modifying discriminative nodes to remove bias and retain a high accuracy. The experimental results indicate that the proposed tree modification method can result in fair decision trees with higher accuracy than those trained from scratch.

Teaching Algorithmic Reasoning via In-context Learning

Hattie Zhou (Mila)*; Azade Nazi (Google Brain); Hugo Larochelle (Google); Aaron Courville (MILA, Université de Montréal); Behnam Neyshabur (Google); Hanie Sedghi (Google)
Abstract: Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to reliably solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. (2022) showed that even simple algorithmic reasoning tasks such as parity are far from solved. In this work, we study the extent to which algorithmic reasoning abilities exist in LLMs, and we ground our investigation in four key learning stages: (1) learning algorithms as skills, (2) learning multiple skills simultaneously, (3) learning how to compose simple skills into more complex ones, and (4) learning how to use skills as tools. We show that it is possible to teach algorithms to LLMs via algorithmic prompting. We evaluate our approach on a variety of arithmetic and quantitative reasoning tasks, and demonstrate significant boosts in performance over existing prompting techniques. In particular, for long parity, addition, multiplication and subtraction, we achieve an error reduction of approximately 10x, 9x, 5x and 2x respectively compared to the best available baselines.

IDMU: Impact Driven Machine Unlearning

Shubhi Asthana (IBM Research - Almaden)*; Bing Zhang (IBM Research - Almaden); Ruchi Mahindru (IBM Watson Research Center); Indervir Singh Banipal (IBM); Pawan Chowdhary (IBM Research - Almaden)
Abstract: Enterprise organizations have large amounts of data which is utilized by multiple Machine Learning (ML) models over various software frameworks. These models provide trends and insights from the data that can help enterprises define business rules around their processes. However, if certain aspects of this data are removed from the datasets, it could influence the business rules and policies in place. When a user requests data to be removed, the model retraining may be required called Machine Unlearning (MU). Recent research works in the area of MU include different methods of retraining the machine learning models. It turns out that there is lack of work in removing certain aspects of data, and quantifying its impact on the models. This work aspires to provide a novel methodology IDMU (Impact Driven Machine Unlearning) that performs quantification of the impact of data removal requests while performing MU. Our method provides recommendations for data removal requests, factoring in underlying features of data. The results from the industrial application and evaluation of our method on a financial services dataset are encouraging. For a set of 120 data removal requests, the accuracy of quantification algorithm is higher with the error threshold of 0.77. The overall impact driven machine unlearning model had a mean absolute percentage error (MAPE) of 0.72. It also saved $1900 hours of MU and retraining time by factoring in urgency and impact of data removal requests. This occurred over a period of 4-5 months for 120 data removal requests.

GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

Rishabh Agarwal (Google Research, Brain Team)*; Nino Vieillard (Google DeepMind); Piotr Stańczyk (Google Brain); Sabela Ramos (Google Research); Matthieu Geist (Google Brain); Olivier Bachem (Google Brain)
Abstract: Knowledge distillation is commonly used for compressing neural networks to reduce their inference cost and memory footprint. However, current distillation methods for auto-regressive models, such as generative language models (LMs), suffer from two key issues: (1) distribution mismatch between output sequences during training and the sequences generated by the student during its deployment, and (2) model under-specification, where the student model may not be expressive enough to fit the teacher's distribution. To address these issues, we propose Generalized Knowledge Distillation (GKD). GKD mitigates distribution mismatch by sampling output sequences from the student during training. Furthermore, GKD handles model under-specification by optimizing alternative divergences, such as reverse KL, that focus on generating samples from the student that are likely under the teacher's distribution. We demonstrate that GKD outperforms commonly-used approaches for distilling LLMs on summarization, machine translation, and arithmetic reasoning tasks.

Applying Policy Gradient Methods to Image-Based Autonomous Vehicles

Elias O Chang (UCSC)*; Leilani H Gilpin (UCSC)
Abstract: Deep learning has been increasing development in autonomous vehicles. However, supervised deep learning approaches require ample, expert labelled data for training models. Such ideal data is difficult and expensive to acquire. Deep reinforcement learning (DRL) provides an alternative approach to deep learning by having an agent learn from interactions in an environment. DRL benchmarks have shown success in low-dimensional spaces. This warrants a high-fidelity benchmark suite for DRL algorithms. In particular, we adapt and measure policy-gradient methods in the autonomous vehicle setting.

CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation

Daniel Reidenbach (UC Berkeley)*; Aditi Krishnapriyan (UC Berkeley)
Abstract: Molecular conformer generation (MCG) is an important task in cheminformatics and drug discovery. The ability to efficiently generate low-energy 3D structures can avoid expensive quantum mechanical simulations, leading to accelerated screenings and enhanced structural exploration. Several generative models have been developed for MCG, but many struggle to consistently produce high-quality conformers. To address these issues, we introduce CoarsenConf, which coarse-grains molecular graphs based on torsional angles and integrates them into an SE(3)-equivariant hierarchical variational autoencoder. Through equivariant coarse-graining, we aggregate the fine-grained atomic coordinates of subgraphs connected via rotatable bonds, creating a variable-length coarse-grained latent representation. Our model uses a novel aggregated attention mechanism to restore fine-grained coordinates from the coarse-grained latent representation, enabling efficient autoregressive generation of large molecules. Furthermore, our work expands current conformer generation benchmarks and introduces new metrics to better evaluate the quality and viability of generated conformers. We demonstrate that CoarsenConf generates more accurate conformer ensembles compared to prior generative models and traditional cheminformatics methods.

Exploring Zero and Few-shot Techniques for Intent Classification in Conversational AI

Mitul Tiwari (Passage AI)*; Soham Parikh (ServiceNow); Quaizar Vohra (ServiceNow); Prashil Tumbade (ServiceNow)
Abstract: Conversational AI providers often need to scale to thousands of models where new customers often face the cold-start problem. Scaling to so many customers puts a constraint on storage space as well. Intent classification is central problem in Conversational AI. In this work, we explore four different zero and few-shot intent classification approaches with this low-resource constraint: 1) domain adaptation, 2) data augmentation, 3) zero-shot intent classification using descriptions large language models (LLMs), and 4) parameter-efficient fine-tuning of instruction-fine tuned language models. Our results show that all these approaches are effective to different degrees in low-resource settings. Parameter-efficient fine-tuning using T-few recipe (Liu et al., 2022) on Flan-T5 (Chung et al., 2022) yields the best performance even with just one sample per intent. We also show that the zero-shot method of prompting LLMs using intent descriptions is also very competitive.

Automatic Creative Selection with Cross-Modal Matching

Alex Kim (University of Southern California); Jia Huang (Apple)*; Rob Monarch (Apple); Jerry Kwac (Apple); Anikesh Kamath (Apple); Parmeshwar Khurd (Apple); Kailash Thiyagarajan (Apple); Goodman Gu (Apple)
Abstract: We present a novel approach for matching images to text in the specific context of matching an application image to the search phrases that someone might use to discover that application. We share a new fine-tuning approach for a pre- trained cross-modal model, tuned to search-text and application-image data. We evaluate matching images to search phrases in two ways: the application developers’ intuitions about which search phrases are the most relevant to their application images; and the intuitions of professional human annotators about which search phrases are the most relevant to a given application. Our approach achieves 0.96 and 0.95 AUC for these two ground truth datasets, which outperforms current state-of-the-art models by 8%-17%.

Cost-sensitive learning of classification trees, with application to imbalanced datasets

Magzhan Gabidolla (University of California, Merced)*; Arman Zharmagambetov (UC Merced); Miguel A Carreira-Perpinan (UC Merced)
Abstract: Important practical applications such as fraud or spam detection or churn prediction involve binary classification problems where the datasets are imbalanced and the cost of false positives greatly differs from the cost of false negatives. We focus on classification trees, in particular oblique trees, which subsume both the traditional axis-aligned trees and logistic regression, but are more accurate than both while providing interpretable models. Rather than using ROC curves, we advocate a loss based on minimizing the false negatives subject to a maximum false positive rate, or equivalently minimizing a weighted 0/1 loss. This yields a curve of classifiers that provably dominates the ROC curve, but is hard to optimize. We give the first algorithm that can iteratively update the tree parameters globally so that the weighted 0/1 loss decreases monotonically. Experiments on various datasets with class imbalance or class costs show this indeed dominates ROC-based classifiers and significantly improves over previous approaches to learn trees based on weighted purity criteria or over- or undersampling.

REALM: Robust Entropy Adaptive Loss Minimization for improved single-sample test-time adaptation

Skyler Seto (Apple)*; Barry-John Theobald (Apple); Federico Danieli (Apple); Navdeep Jaitly (Apple); Dan Busbridge (Apple)
Abstract: Test-time adaptation (TTA) involves updating a pre-trained model to compensate for data distribution shifts without access to the training data or knowledge of training. Online adaptation uses each sample in an incoming stream of test samples by minimizing a self-supervised objective. However, online adaptation is known to be unstable due to noisy samples. We present Robust Entropy Adaptive Loss Minimization (REALM), which improves the robustness of online adaptation to noisy samples and results in better test-time accuracy.

Viewpoint Equivariance for Multi-View 3D Object Detection

Dian Chen (Toyota Research Institute)*; Jie Li (Nvidia); Vitor Guizilini (Toyota Research Institute); RareČ™ A AmbruČ™ (Toyota Research Institute); Adrien Gaidon (Toyota Research Institute)
Abstract: 3D object detection from visual sensors is a cornerstone capability of robotic systems. State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input. In this work we gain intuition from the integral role of multi-view consistency in 3D scene understanding and geometric learning. To this end, we introduce, a novel 3D object detection framework that exploits 3D multi-view geometry to improve localization through viewpoint awareness and equivariance. VEDet leverages a query-based transformer architecture and encodes the 3D scene by augmenting image features with positional encodings from their 3D perspective geometry. We design view-conditioned queries at the output level, which enables the generation of multiple virtual frames during training to learn viewpoint equivariance by enforcing multi-view consistency. The multi-view geometry injected at the input level as positional encodings and regularized at the loss level provides rich geometric cues for 3D object detection, leading to state-of-the-art performance on the nuScenes benchmark. The code and model are made available at

Deep Metric Learning with Soft Orthogonal Proxies

Farshad Saberi Movahed (NVIDIA)*; Mohammad Kazem Ebrahimpour (Ericsson); Farid Saberi-Movahed (Graduate University of Advanced Technology, Kerman, Iran); Monireh Moshavash (Shahid Bahonar University of Kerman); Dorsa Rahmatian (Shahid Bahonar University of Kerman); mahvash mohazzebi (Shahid Bahonar University of Kerman); Mahdi Shariatzadeh (Shahid Bahonar University of Kerman); Mahdi Eftekhari (Shahid Bahonar university of kerman)
Abstract: Deep Metric Learning (DML) models rely on strong representations and similarity-based measures with specific loss functions. Proxy-based losses have shown great performance compared to pair-based losses in terms of convergence speed. However, proxies that are assigned to different classes may end up being closely located in the embedding space and hence having a hard time to distinguish between positive and negative items. Alternatively, they may become highly correlated and hence provide redundant information with the model. To address these issues, we propose a novel approach that introduces Soft Orthogonality (SO) constraint on proxies. The constraint ensures the proxies to be orthogonal to maximize their positions in the embedding space. Our approach leverages Data-Efficient Image Transformer (DeiT) as an encoder to extract contextual features from images along with a DML objective. The objective is made of the Proxy Anchor loss along with the SO regularization. We evaluate our method on four public benchmarks for category-level image retrieval and demonstrate its effectiveness with comprehensive experimental results and ablation studies. Our evaluations demonstrate the superiority of our proposed approach over state-of-the-art methods by a large margin.

Bivariate decision trees

Rasul Kairgeldin (University of California, Merced)*; Miguel A Carreira-Perpinan (UC Merced)
Abstract: Univariate decision trees, commonly used since the 1950s, predict by asking questions about a single feature in each decision node. While they are interpretable, they often lack competitive predictive accuracy due to their inability to model feature correlations, use a limited number of input features, and rely on heuristic algorithms for training. In contrast, multivariate (oblique) trees can use multiple features in each node, capturing high-dimensional correlations better. However, they can be difficult to interpret. Bivariate decision trees offer a practical compromise by using pairs of features in each node, striking a balance between interpretability and accuracy. By adapting the Tree Alternating Optimization (TAO) algorithm, bivariate trees can be trained more effectively, resulting in smaller and more accurate trees. The TAO algorithm updates node parameters iteratively, optimizing a defined objective function over the entire tree. While slower than traditional algorithms, it scales well to large datasets. Our experiments demonstrate that bivariate trees outperform univariate trees in terms of interpretability and accuracy. We believe bivariate trees offer a practical and scalable solution for data analysis tasks.

NeRFuser: Scalable Scene Representation by NeRF Registration and Blending

Jiading Fang (Toyota Technological Institute at Chicago)*; Shengjie Lin (Toyota Technological Institute at Chicago); Igor Vasiljevic (Toyota Research Institute); Vitor Guizilini (Toyota Research Institute); RareČ™ A AmbruČ™ (Toyota Research Institute); Adrien Gaidon (Toyota Research Institute); Greg Shakhnarovich (TTI-Chicago); Matthew R Walter (Toyota Technological Institute at Chicago)
Abstract: We present NeRFuser, a novel algorithm that extends the representational capacity of neural radiance fields (NeRFs) to produce high-fidelity representations of large scale scenes. Integral to our approach is the decomposition of spatially extended environments into a collection of small-scale scenes, each represented by individual NeRF models. Critically, we only assume access to these individual NeRFs and not any of the original training images or the poses on which they were trained, nor relative transformations between the NeRFs. Towards this goal, NeRFuser is built as a two-stage procedure that consists of NeRF registration and NeRF blending. For registration, we propose registration from re-rendering, a technique that infers the transformation between NeRFs based on images synthesized from individual NeRFs. For blending, we propose sample-based inverse-distance-weighting to blend visual information at the ray-sample level. We evaluate NeRFuser public benchmarks, showing state-of-the-art results on test views.

🎤 What Makes ImageNet Look Unlike LAION

Ali Shirali (UC Berkeley)*; Moritz Hardt (Max Planck Institute for Intelligent Systems)
Abstract: ImageNet was famously created from Flickr image search results. What if we recreated ImageNet instead by searching the massive LAION dataset based on image captions alone? In this work, we carry out this counterfactual investigation. We find that the resulting ImageNet recreation, which we call LAIONet, looks distinctly unlike the original. Specifically, the intra-class similarity of images in the original ImageNet is dramatically higher than it is for LAIONet. Consequently, models trained on ImageNet perform significantly worse on LAIONet. We propose a rigorous explanation for the discrepancy in terms of a subtle, yet important, difference in two plausible causal data-generating processes for the respective datasets, that we support with systematic experimentation. In a nutshell, searching based on an image caption alone creates an information bottleneck that mitigates the selection bias otherwise present in image-based filtering. Our explanation formalizes a long-held intuition in the community that ImageNet images are stereotypical, unnatural, and overly simple representations of the class category. At the same time, it provides a simple and actionable takeaway for future dataset creation efforts.

Provable Robust Watermarking for AI-Generated Text

Xuandong Zhao (UCSB)*; Prabhanjan Ananth (UCSB); Lei Li (University of California Santa Barbara); Yu-Xiang Wang (UC Santa Barbara)
Abstract: As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on generation quality, correctness in detection, and security against evasion attacks. Experimental results on various large language models (LLMs) and diverse datasets demonstrate that our method achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs.

iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Kevin Bello (UChicago/CMU)*
Abstract: Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying \textit{functional} mechanism shifts in two or more related SCMs over the same set of variables---\textit{without estimating the entire DAG structure of each SCM}. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.

IC3: Image Captioning by Committee Consensus

David Chan (University of California, Berkeley)*; Austin Myers (Google); Sudheendra Vijayanarasimhan (Google research); David A Ross (Google); John F Canny (UC Berkeley)
Abstract: If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: ``Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description.

Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats

Xuandong Zhao (UCSB)*; Kexun Zhang (UC Santa Barbara); Yu-Xiang Wang (UC Santa Barbara); Lei Li (University of California Santa Barbara)
Abstract: Invisible watermarks safeguard images' copyrights by embedding hidden messages detectable by owners. It also prevents people from misusing images, especially those generated by AI models. Malicious adversaries can violate these rights by removing the watermarks. In order to remove watermarks without damaging the visual quality, the adversary needs to erase them while retaining the essential information in the image. This is analogous to the encoding and decoding process of generative autoencoders, especially variational autoencoders (VAEs) and diffusion models. We propose a framework using generative autoencoders to remove invisible watermarks and test it using VAEs and diffusions. Our results reveal that, even without specific training, off-the-shelf Stable Diffusion effectively removes most watermarks, surpassing all current attackers. The result underscores the vulnerabilities in existing watermarking schemes and calls for more robust methods for copyright protection.

Federated Learning of Gboard Language Models with Differential Privacy

Zheng Xu (Google Research)*; Yanxiang Zhang (Individual); Galen Andrew (Google); Christopher A. Choquette-Choo (Google); Peter Kairouz (Google); Brendan McMahan (Google); Jesse Rosenstock (Google); Yuanbo Zhang (Google)
Abstract: We train language models (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard). We apply the DP-Follow-the-Regularized-Leader (DP-FTRL)~\citep{kairouz21b} algorithm to achieve meaningfully formal DP guarantees without requiring uniform sampling of client devices. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation~\citep{andrew2019differentially} can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation for training. With the help of pretraining on public data, we train and deploy more than twenty Gboard LMs that achieve high utility and $\rho-$zCDP privacy guarantees with $\rho \in (0.2, 2)$, with two models additionally trained with secure aggregation~\citep{bonawitz2017practical}. We are happy to announce that all the next word prediction neural network LMs in Gboard now have DP guarantees, and all future launches of Gboard neural network LMs will require DP guarantees. We summarize our experience and provide concrete suggestions on DP training for practitioners.

regGPT: Integrating autoregressive DNA language models and supervised models to design realistic regulatory DNA

Avantika Lal (Genentech); Anay Gupta (Genentech); Tommaso Biancalani (Genentech); Gokcen Eraslan (Genentech)*
Abstract: Cis-regulatory elements (CREs) are regulatory DNA sequences that control gene expression. Designing synthetic CREs is challenging but has potential therapeutic applications. Autoregressive language models can learn the DNA regulatory code and generate realistic CREs with precisely controlled properties. This approach has been used to design CREs that are specific to different cell types, such as microglia. These findings suggest that autoregressive language models can be used to achieve controlled synthesis of regulatory elements.

Learning stochastic dynamics and predicting emergent behavior using transformers

Corneel Casert (Ghent University)*; Isaac Tamblyn (University of Ottawa); Stephen Whitelam (Lawrence Berkeley National Lab)
Abstract: Learning the dynamics governing a simulation or experiment is a demanding task because the number of possible transitions between configurations of the system typically increases exponentially with the size of the system. For large systems, these transitions are too numerous be enumerated explicitly. Here we show that it is possible to circumvent this restriction by using a neural network, specifically a transformer, to parameterize a stochastic dynamics.The transformer is used to represent the large number of transition rates efficiently, and it is optimized by maximizing the likelihood that its rates could have generated the observed trajectory. We consider a lattice model undergoing continuous-time Monte Carlo dynamics, simulated at a density at which its steady state comprises small, dispersed clusters. The transformer, which we show has the capacity to represent dynamical rules that are numerous and nonlocal, learns that the dynamics of this model consists of a small number of processes. Forward-propagated trajectories of the trained transformer at densities not encountered during training exhibit phase separation and accurately predict the existence of a phase transition. Transformers have the flexibility to learn dynamical rules from observation without explicit enumeration of rates or coarse-graining of configuration space, and so the procedure used here can be applied to a wide range of physical systems, including those with large and complex dynamical generators.

Quantum speedups for stochastic optimization

Aaron Sidford (Stanford); Chenyi Zhang (Stanford)*
Abstract: We consider the problem of minimizing a continuous function given quantum access to a stochastic gradient oracle. We show that for the problem of minimizing a Lipschitz convex function quantum-oracle access yields rates that provably improve upon their classical counterparts. We provide two distinct methods that achieve such accuracy versus dimension tradeoffs and prove that one is asymptotically optimal in low-dimensional settings. Additionally, we provide quantum algorithms for computing a critical point of a smooth non-convex function at rates not known to be achievable classically. To achieve these results we provide a general quantum-variance reduction technique built upon the quantum multivariate mean estimation result of Cornelissen et al., which may be of independent interest.

Field Evaluation of a Machine Learning Decision Support Tool for Traffic Flow Management at Dallas Fort Worth International Airport

William J Coupe (NASA)*; Alexandre Amblard (Universities Space Research Association); Sarah Youlton (Universities Space Research Association); Mathew Kistler (Mosaic ATM)
Abstract: Future needs of the National Airspace System (NAS) require decision support tools to adopt a service-oriented architecture in alignment with the FAA’s vision for an Info-Centric NAS. To achieve this, many existing systems will need to undergo a digital transformation from a monolithic decision support tool toward learning, adaptable, and lightweight interacting systems exposed through well-defined Application Programming Interfaces. To enable this transformation, NASA's Digital Information Platform (DIP) was developed as a cloud-based platform for advanced, data-driven, digital services for aviation with a special focus towards Artificial Intelligence and Machine Learning services. This paper describes the ML Airport Surface Model which is a decision support tool deployed on DIP to support traffic flow management at major US airports. Validation results are provided from an operational field evaluation where ML performance was benchmarked against the legacy physics based approach.

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Vitor Guizilini (Toyota Research Institute)*; Igor Vasiljevic (Toyota Research Institute); Dian Chen (Toyota Research Institute); RareČ™ A AmbruČ™ (Toyota Research Institute); Adrien Gaidon (Toyota Research Institute)
Abstract: Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. ZeroDepth achieves a new state-of-the-art in both indoor and outdoor settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates.

Pros and cons of soft vs hard decision trees

Kuat Gazizov (UC Merced)*; Arman Zharmagambetov (Meta); Miguel A Carreira-Perpinan (UC Merced)
Abstract: Decision trees come in two basic types. In soft decision trees (SDT) (also called hierarchical mixtures of experts), an input instance follows each root-leaf path, each having a positive probability, and the tree output is the probability-weighted average of its leaf predictions. This happens because each decision node produces a positive probability for each of its children. In (hard) decision trees (HDT), an input instance follows exactly one root-leaf path because each decision node picks only one child. Both types have been around for decades, but we find no proper comparison of them (across factors such as accuracy, interpretability, inference time, etc.), which is surprising given the importance of decision trees. We have carried out a theoretical and empirical comparison and pro- vide a brief summary of our conclusions here. We focus on a single tree (not a forest) with oblique (not axis-aligned) decision nodes and constant leaves for classification.

Influence of Variable Encoding on Group Fairness in the Presence of Shortcut Learning

Benjamin Maudet (Université Paris Saclay); Karan Bhanot (Rensselaer Polytechnic Institute)*; Kristin Bennett (Rensselaer Polytechnic Institute)
Abstract: We study how variable encoding can affect fairness in Machine Learning. We examine examples where different strategies for encoding the sensitive and other variables can result in greater unfairness for different classification methods in datasets with spurious correlations. Different combinations of encoding choices can magnify spurious correlations and worsen unfair predictions as measured by a group fairness metric. We hypothesize that encoding strategies change the availability of variables even when their predictivity remains the same. We demonstrate these phenomena on a suite of toy and real problems. Further research is needed to address encoding strategies' potential effect on fairness and shortcut learning.

🎤 ViNT: A Foundation Model for Visual Navigation

Dhruv Shah (UC Berkeley)*; Ajay Sridhar (Berkeley); Nitish R Dashora (University of California, Berkeley); Kyle W Stachowicz (University of California, Berkeley); Kevin Black (UC Berkeley ); Noriaki Hirose (Toyota Motor North America); Sergey Levine (UC Berkeley)
Abstract: General-purpose pre-trained models (``foundation models'') have enabled practitioners to produce generalizable solutions for individual machine learning problems with datasets that are significantly smaller than those required for learning from scratch. Such models are typically trained on large and diverse datasets with weak supervision, consuming much more training data than is available for any individual downstream application. In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation. ViNT is trained with a general goal-reaching objective that can be used with any navigation dataset, and employs a flexible Transformer-based architecture to learn navigational affordances and enable efficient adaptation to a variety of downstream navigational tasks. ViNT is trained on a number of existing navigation datasets, comprising hundreds of hours of robotic navigation from a variety of different robotic platforms, and exhibits positive transfer, outperforming specialist models trained on narrower datasets. ViNT can be augmented with diffusion-based goal proposals to explore novel environments, and can solve kilometer-scale navigation problems when equipped with long-range heuristics. ViNT can also be adapted to novel task specifications with a technique inspired by prompt-tuning, where the goal encoder is replaced by an encoding of another task modality.

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

Daniel Galvez (NVIDIA)*; Tim Kaldewey (NVIDIA)
Abstract: While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based decoding. We introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder that is compatible with current CTC models. It increases pipeline throughput, decreases latency, and supports streaming inference. We provide ABI-stable, pre-built, DLPack-based python bindings for ease of use with modern Python-based machine learning frameworks at We demonstrate that ours is the fastest beam search decoder for CTC models in both offline and online scenarios. It accelerates end-to-end speech recognition inference by nearly an order of magnitude, while maintaining the same or lower word error rates. In the offline scenario it achieves up to 7.3 times more throughput than the current state-of-the-art CPU decoder and in the online streaming scenario, it achieves 7.9 times lower latency.

Stylistic Mastery: Unleashing the Potential of Style Embedding Intervention for Formality-Controlled Spoken Language Translations

Priyesh Vakharia (UC Santa Cruz)*; Shree Vignesh S (University Of California Santa Cruz); Pranjali Basmatkar (University of California, Santa Cruz)
Abstract: This work presents a new approach - style embedding intervention - for formality-controlled spoken language translations using pre-trained multilingual language models. Our findings indicate that this approach - adding an intermediate style embedding layer between the encoder and the decoder - achieves promising results in understanding and controlling the nuances of formal and informal styles in low-resource translation. On the IWSLT formality dataset, our approach improves the translation accuracy over previous additive style intervention methods from 85.2 to 90.6. Our interpretable modeling approach also helps us identify better modeling constraints for formality control. Incorporating this constraint - bos style intervention - further improves the performance of our model from 90.6 to 92, a total of 6.8 points above the state-of-the-art.

Starpoint: A simple and scalable database for embeddings

Andrew L Maas (*; Scott Wey (
Abstract: We introduce Starpoint, a database for developing and deploying applications that use embedding datasets associated with tabular metadata. With the rapid recent progress of large deep learning encoders for images, text, audio, and video , there is a growing number of applications where the most useful way to represent complex data is via a neural network embedding vector. Embedding refers to representing data via the hidden layer activations of a deep neural network. In general, we consider an embedding to be an $n$-dimensional real-valued vector, $v \in R^n$. Developing machine learning (ML) models, chatbots, search/retrieval systems, or similar downstream systems using embedding data requires first creating, storing, and checking the validity of embedding representations. It's often unclear whether a particular embedding model will work well on new data, or which of many embedding models works best for a given task/domain. Further, it's important to enable both rapid development with embeddings and production monitoring of applications using embeddings. Finally, ML practitioners often stumble with the full lifecycle of development to production -- desiring speed and simplicity for prototyping but requiring large-scale, fast systems when integrating ML into full applications. This gap in tool support and collaborator skillsets often leads to a challenging, slow process in going from great prototypes to capable production ML applications. Tools for creating, storing, and searching embedding data are evolving rapidly, below we discuss the design focus of Starpoint and demonstrate its capabilities for querying embeddings with associated metadata.

RLAR: Reinforcement Learning on Agent-specific Reasoning for Large Language Model

Diji Yang (University of California Santa Cruz)*; Kezhen Chen (Mineral LLC); Jinmeng Rao (Mineral); Yi Zhang (University of California Santa Cruz )
Abstract: Researchers have proven that Large language models (LLMs) have remarkable capabilities to use external models for multimodal tasks by using hybrid systems. LLMs are trained to generate inputs for external models and perform reasoning on their outputs. However, most of the existing systems have two limitations. Firstly, as some of the systems are non-differentiable because of the hybrid combination, the LLMs and external models cannot be jointly trained in an end-to-end fashion. Therefore, the performance of these hybrid systems could be significantly influenced by the performance of these external models. Secondly, some systems combine LLMs and external models by aligning latent embeddings. These systems usually require a large number of training data to train the model and lack explainability. In this paper, we present a novel agent-based reinforcement learning approach for LLM-centric hybrid multimodal systems and optimize both LLMs and external models during training. The LLM is regarded as a reasoning agent and each external model is a task-oriented agent. Our method promotes interaction between the reasoning agent and other agents, enabling more efficient multimodal understanding. We focus on the visual question answering (VQA) task and demonstrate the effectiveness of our approach. Additionally, to enhance the reasoning capabilities of the models, we introduce an augmented training corpus incorporating intermediate conversational reasoning steps. Both the model and the data will be publicly available to facilitate future research.

Evolving HyperTransformer Policy Generators for Meta-Reinforcement Learning

Gus Kristiansen (Google)*; Mark Sandler (Google); Max Vladymyrov (Google); Andrey Zhmoginov (Google)
Abstract: We propose a Meta-Reinforcement Learning method that learns an initial exploration policy and a Transformer based Hypernetwork (HyperTransformer) to generate per-task policies. At inference time, the learned initial policy acts in the environment for the first episode. The episode data is then fed into the HyperTransformer which generates an improved policy. This loop is repeated over a set number of episodes before the final generated policy is used for evaluation. We leverage Evolution Strategies method to directly optimize the cumulative reward of the evaluation episode, allowing the HyperTransformer to learn a trade-off between exploration and exploitation appropriate to the given task distribution. To test our approach, we develop a custom task distribution which requires thorough exploration policies in the initial episodes and an efficient exploitation policy for the evaluation episode. We demonstrate that our method is able to learn both explorative and exploitative policies.

Multimodal Open Domain QA using Retrieval Augmentation

Sugam Garg (UC Santa Cruz)*
Abstract: Multimodal Open Domain Question Answering (QA) is the task of question answering in an open domain setting where questions, answers, or both questions and answers can be multimodal. The Internet is a multimodal domain where information is presented in all forms, such as text, images, tables, graphs, etc. We are interested in exploring methods to leverage multimodal documents for question-answering. The two common approaches are building large models with all knowledge encoded in model parameters or using a retrieval-augmented system. The former approach is expensive and difficult to update, and the latter has an additional retrieval step which leads to increased latency. Hence, we propose a chained retrieval augmented generation method for open-domain multimodal QA that first decides whether to retrieve and generate or generate from parameters. We believe in the benefits of open-source work and are motivated to contribute the first open-source multimodal model for open-domain QA. In our work, we propose to improve the reasoning capability of a multimodal model in the question- answering (QA) task. We hypothesize that current methods attempt to retain reasoning capabilities and world knowledge in the model parameters, leading to either inflated model sizes or poor performance. In our proposed work, we will use a retriever-augmented architecture that stores the world knowledge (collection of documents) and allows the model to better reason over various modalities of documents to generate the final answer to user questions. However, in a practical scenario, adding the retriever adds to the system’s latency, and the large language model itself could answer some questions from its parameters. Thus, we will also experiment with adding a multi-step process in generating the answer where in the first hop, the model will decide whether to retrieve or generate the answer from its memory.

Trinity: A No-Code AI platform for complex spatial datasets

C V Krishnakumar Iyer (Apple Inc)*; Feili Hou (Apple); Yonghong Wang (Apple); Swetava Ganguli (Apple); Henry Wang (Apple); Kay Oh (Apple); Vipul Pandey (Apple Inc)
Abstract: We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal datasets to make them consumable by standard deep learning models, in this case, Convolutional Neural Networks (CNNs), and giving the ability to formulate disparate problems in a standard way, eg. semantic segmentation. With an intuitive user interface, a feature store that hosts derivatives of complex feature engineering, a deep learning kernel, and a scalable data processing mechanism, Trinity provides a powerful platform for domain experts to share the stage with scientists and engineers in solving business-critical problems. It enables quick prototyping, rapid experimentation and reduces the time to production by standardizing model building and deployment. In this work, we present our motivation behind Trinity and its design along with showcasing sample applications to motivate the idea of lowering the bar to using AI.

OPERA: Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

Allen Nie (Stanford University)*
Abstract: Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when compared to alternative approaches, our estimator can be used to select higher-performing policies in healthcare and robotics. Our work contributes to improving ease of use for a general-purpose, task-agnostic, off-policy evaluation pipeline for offline RL.

Play to Teach: Adversarial Cooperative Knowledge Transfer in Two-Player Games

Allen Nie (Stanford University)*
Abstract: Over the last few decades, artificial intelligence has been outperforming human experts on a set of challenging tasks, including chess~\citep{campbell2002deep}, Go~\citep{silver2017mastering}, StarCraft II~\citep{vinyals2019grandmaster}, DoTA~\citep{berner2019dota}, and Gran Turismo~\citep{wurman2022outracing}. The educational potential of games like chess or Go is well-known, as they can effectively cultivate reasoning abilities in children. However, in the context of teaching, human instructors have inherent limitations as a finite resource. We explore the feasibility of constructing a teaching agent directly from an expert agent. Specifically, we examine a scenario where a teaching agent engages with a student agent in a two-player game. To ensure meaningful learning, the teaching agent must adopt an adversarial stance towards the student agent. Merely adopting a fully cooperative stance would result in the teaching agent making sub-optimal moves and allowing the student to win easily. The students will fail to counter the moves of an expert effectively.

Modular Adaptive Depth Networks for Generalizing Over the Number of Hops

Samira Abnar (Apple)*; Omid Saremi (Apple Inc.); Shantel Wilson (Apple); Laurent Dinh (Apple); Miguel Angel Bautista (Apple); Vimal Thilak (Apple); Preetum Nakkiran (Apple); Etai Littwin (Apple); Jiatao Gu (Apple); Chen Huang (Apple); Joshua M Susskind (Apple); Samy Bengio (Google Research, Brain Team)
Abstract: We introduce conditional PVR, an extension to the PVR benchmark~\citep{DBLP:journals/corr/abs-2107-12580}, for evaluating the ability of models to generalize to computation graphs of different depth. We speculate that mechanisms for adaptive and modular compute can facilitate learning tasks that require generalization over the number of sequential computation steps. We show that a new transformer architecture, Hyper-UT with adaptive depth and specific implementation of modularity/sparsity is able to solve the conditional PVR task better than other transformer variants we tried when learning from scratch.

Call for Abstracts

BayLearn 2023

BayLearn 2023 will be an in-person event hosted in the San Francisco Bay Area.

The BayLearn 2023 abstract submission site is now open for submissions:

The abstract submission deadline is Thursday, July 13, 2023 11:59pm PDT.

Please submit abstracts as a 2-page PDF in NeurIPS format. An extra page for acknowledgements and references is allowed.

About BayLearn

The BayLearn Symposium is an annual gathering of machine learning researchers and scientists from the San Francisco Bay Area. While BayLearn promotes community building and technical discussions between local researchers from academic and industrial institutions, it also welcomes visitors. This one-day event combines invited talks, contributed talks, and posters, to foster exchange of ideas.

Meet with fellow Bay Area machine learning researchers and scientists during the symposium that will be held in mid October–The exact date to be announced.

Feel free to circulate this invitation to your colleagues and relevant contacts.

Key Dates


We encourage submission of abstracts. Acceptable material includes work which has already been submitted or published, preliminary results, and controversial findings. We do not intend to publish paper proceedings; only abstracts will be shared through an online repository. Our primary goal is to foster discussion! For examples of previously accepted talks, please watch the paper presentations from previous BayLearn Symposiums:

For more information about submissions, please look here:

Submit your abstracts via CMT:

Mailing List: If this email was forwarded to you, and you would like to join the BayLearn mailing list so that you will receive future communications from us directly, please sign up here.

Unsubscribe Note: you are receiving this e-mail because you have previously registered for, or registered interest in BayLearn. If you wish to no longer receive e-mails from BayLearn, please unsubscribe using this link: Unsubscribe

Best Regards,

The BayLearn Organizers