Publications

Alvin Chan*, Ameya R. Kirtane, Qing Rui Qu, Xisha Huang, Jonathan Woo, Deepak A. Subramanian, Rajib Dey, Rika Semalty, Joshua D. Bernstock, Taksim Ahmed, Rowan Honeywell, Charles Hanhurst, Isaac Diaz Becdach, Leah S. Prizant, Ashley K. Brown, Hao Song, Justin Law Cobb, Louis B. DeRidder, Bruna Santos, Miguel Jimenez, Michelle Sun, Yuebin Huang, Ceara Byrne, Giovanni Traverso

October 2025 Nature Nanotechnology

Designing Lipid Nanoparticles Using a Transformer-Based Neural Network

TL;DR:We present COMET, a transformer-based deep learning model trained on a large multi-component LNP dataset (LANCE) that predicts LNP efficacy, stability, and cross-cell performance, enabling rapid, data-driven design of next-generation RNA delivery systems beyond traditional experimental limits.
Abstract: The RNA medicine revolution has been spurred by lipid nanoparticles (LNPs). The effectiveness of an LNP is determined by its lipid components and their ratios; however, experimental optimization is laborious and does not explore the full design space. Computational approaches such as deep learning can be greatly beneficial, but the composite nature of LNPs limits the effectiveness of existing single molecule-based algorithms to LNPs. Addressing this, our approach integrates the multi-component and multimodal features of composite formulations such as LNPs to predict their performance in an end-to-end manner. Here we generate one of the largest LNP datasets (LANCE) by varying LNP formulations to train our deep learning model, COMET. This transformer-based neural network not only accurately predicts the efficacy of LNPs but is adaptable to non-canonical LNP formulations such as those with two ionizable lipids and polymeric materials. Furthermore, COMET can predict LNP performance in a cell line outside of LANCE and predict LNP stability during lyophilization using only small training datasets. Experimental validation showed that our approach can identify LNPs that exhibit strong protein expression in vitro and in vivo, promising accelerated development of nucleic acid therapies with extensive potential across therapeutic and manufacturing applications.

Tianle Zhang, Wanlong Fang, Jonathan Woo, Paridhi Latawa, Deepak A. Subramanian, Alvin Chan*

October 2025 NeurIPS 2025 (Thirty-Ninth Conference on Neural Information Processing Systems)

Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning

TL;DR: We introduce In-Context Representation Learning (ICRL), a training-free framework that enables large language models to integrate and reason over non-text modality representations (e.g., from other foundation models) through few-shot, in-context learning for adaptable multi-modal generalization.
Abstract: The remarkable performance of Large Language Models (LLMs) can be enhanced with test-time computation, which relies on external tools and even other deep learning models. However, existing approaches for integrating non-text modality representations into LLMs typically require additional costly supervised training, restricting on-the-fly adaptation to new domains and modalities. In this work, we explore the feasibility of integrating representations from non-text foundational models (FMs) into text-based LLMs in a training-free manner. We propose In-Context Representation Learning (ICRL) as a proof-of-concept to allow LLMs to adaptively utilize non-text modality representations with few-shot learning. Unlike traditional in-context learning, which incorporates text-label pairs, ICRL replaces text inputs with FM representations, enabling the LLM to perform multi-modal inference without fine-tuning. We evaluate ICRL on a suite of tasks in the molecular domain, investigating three core research questions (i) how to map FM representations into LLMs in a training-free manner, (ii) what factors influence ICRL performance, and (iii) what mechanisms underlie the effectiveness of ICRL. To the best of our knowledge, ICRL is the first training-free framework for integrating non-text modality representations into text-based LLMs, presenting a promising direction for adaptable, multi-modal generalization.

Alvin Chan*, Ali Madani*, Ben Krause, Nikhil Naik

December 2021 NeurIPS 2021 (Thirty-Fifth Conference on Neural Information Processing Systems)

Deep Extrapolation for Attribute-Enhanced Generation

TL;DR: How do we generate sequences that extrapolate beyond the training distribution?
Abstract: Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins, and propose GENhance, a generative framework that enhances attributes through a learned latent space. Trained on movie reviews and a computed protein stability dataset, GENhance can generate strongly-positive text reviews and highly stable protein sequences without being exposed to similar data during training.

Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan*, Shuai Zhang

December 2021 NeurIPS 2021 (Thirty-Fifth Conference on Neural Information Processing Systems)

Self-Instantiated Recurrent Units with Dynamic Soft Recursion

TL;DR: We propose the self-instantiated recurrent unit that is characterized by recursive instantiation of the model itself, where the extent of the recursion may vary temporally.
Abstract: While standard recurrent neural networks explicitly impose a chain structure on different forms of data, they do not have an explicit bias towards recursive self-instantiation where the extent of recursion is dynamic. Given diverse and even growing data modalities (e.g., logic, algorithmic input and output, music, code, images, and language) that can be expressed in sequences and may benefit from more architectural flexibility, we propose the self-instantiated recurrent unit (Self-IRU) with a novel inductive bias towards dynamic soft recursion. On one hand, theSelf-IRU is characterized by recursive self-instantiation via its gating functions, i.e., gating mechanisms of the Self-IRU are controlled by instances of the Self-IRU itself, which are repeatedly invoked in a recursive fashion. On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step. The architectural flexibility and effectiveness of our proposed approach are demonstrated across multiple data modalities. For example, the Self-IRU achieves state-of-the-art performance on the logical inference dataset even when comparing with competitive models that have access to ground-truth syntactic information.

Aston Zhang, Alvin Chan*, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, Roy Ka-Wei Lee

August 2021 ACL 2021 (Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Volume 2: Short Papers)

On Orthogonality Constraints for Transformers

TL;DR: Orthogonality constraints the encourages numerical stability improves model’s performance in NLP tasks.
Abstract: Orthogonality constraints encourage matrices to be orthogonal for numerical stability. These plug-and-play constraints, which can be conveniently incorporated into model training, have been studied for popular architectures in natural language processing, such as convolutional neural networks and recurrent neural networks. However, a dedicated study on such constraints for transformers has been absent. To fill this gap, this paper studies orthogonality constraints for transformers, showing the effectiveness with empirical evidence from ten machine translation tasks and two dialogue generation tasks. For example, on the large-scale WMT’16 En→De benchmark, simply plugging-and-playing orthogonality constraints on the original transformer model increases the BLEU from 28.4 to 29.6, coming close to the 29.7 BLEU achieved by the very competitive dynamic convolution.

Alvin Chan*, Yew-Soon Ong, Bill Pung, Aston Zhang, Jie Fu

January 2021 ICLR 2021 (International Conference on Learning Representations)

CoCon: A Self-Supervised Approach for Controlled Text Generation

TL;DR: We propose CoCon to control the content of text generation from LMs by conditioning on content inputs at an interleave layer.
Abstract: Pretrained Transformer-based language models (LMs) display remarkable natural language generation capabilities. With their immense potential, controlling text generation of such LMs is getting attention. While there are studies that seek to control high-level attributes (such as sentiment and topic) of generated text, there is still a lack of more precise control over its content at the word- and phrase-level. Here, we propose Content-Conditioner (CoCon) to control an LM’s output text with a content input, at a fine-grained level. In our self-supervised approach, the CoCon block learns to help the LM complete a partially-observed text sequence by conditioning with content inputs that are withheld from the LM. Through experiments, we show that CoCon can naturally incorporate target content into generated texts and control high-level text attributes in a zero-shot manner.

Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan*, Anh Tuan Luu, Siu Cheung Hui, Jie Fu

January 2021 ICLR 2021 (International Conference on Learning Representations)

Parameterization of Hypercomplex Multiplications

TL;DR: We propose a new parameterization of hypercomplex multiplications for architectural flexibility and effectiveness.
Abstract: Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, the Hamilton product (4D hypercomplex multiplication) enables learning effective representations while saving up to 75% parameters. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions. This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility. Experiments of applications to LSTM and Transformer on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach.

Alvin Chan*, Anna Korsakova*, Yew-Soon Ong, Fernaldo Richtia Winnerdy, Kah Wai Lim, Anh Tuan Phan

January 2021 ACM CHIL 2021 (Proceedings of the Conference on Health, Inference, and Learning)

RNA Alternative Splicing Prediction with Discrete Compositional Energy Network

TL;DR: We construct an RNA alternative splicing regression dataset (CAPD) and propose DCEN to predict splicing outcomes by modeling mRNA transcript probabilities through its constituent splice junctions’ energy.
Abstract: A single gene can encode for different protein versions through a process called alternative splicing. Since proteins play major roles in cellular functions, aberrant splicing profiles can result in a variety of diseases, including cancers. Alternative splicing is determined by the gene’s primary sequence and other regulatory factors such as RNA-binding protein levels. With these as input, we formulate the prediction of RNA splicing as a regression task and build a new training dataset (CAPD) to benchmark learned models. We propose discrete compositional energy network (DCEN) which leverages the hierarchical relationships between splice sites, junctions and transcripts to approach this task. In the case of alternative splicing prediction, DCEN models mRNA transcript probabilities through its constituent splice junctions’ energy values. These transcript probabilities are subsequently mapped to relative abundance values of key nucleotides and trained with ground-truth experimental measurements. Through our experiments on CAPD, we show that DCEN outperforms baselines and ablation variants.

Alvin Chan*, Tay Yi, Yew-Soon Ong, Aston Zhang

October 2020 Findings of Empirical Methods in Natural Language Processing 2020

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

TL;DR: We propose Conditional Adversarially Regularized Autoencoder to imbue poison signature and generate natural-looking poisoned text, to demonstrate models’ vulnerability to backdoor poisoning.
Abstract: This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a ‘backdoor poisoning’ attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier’s predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

Wei Long Ng*, Alvin Chan*, Yew-Soon Ong, Chee Kai Chua

May 2020 Virtual and Physical Prototyping, Volume 15, 2020 - Issue 3

Deep learning for fabrication and maturation of 3D bioprinted tissues and organs

TL;DR: Perceptive paper on how deep learning can improve 3D bioprinting.
Abstract: Bioprinting is a relatively new and promising tissue engineering approach to solve the problem of donor shortage for organ transplantation. It is a highly-advanced biofabrication system that enables the printing of materials in the form of biomaterials, living cells and growth factors in a layer-by-layer manner to manufacture 3D tissue-engineered constructs. The current workflow involves a myriad of manufacturing complexities, from medical image processing to optimisation of printing parameters and refinements during post-printing tissue maturation. Deep learning is a powerful machine learning technique that has fuelled remarkable progress in image and language applications over the past decade. In this perspective paper, we highlight the integration of deep learning into 3D bioprinting technology and the implementation of practical guidelines. We address potential adoptions of deep learning into various 3D bioprinting processes such as image-processing and segmentation, optimisation and in-situ correction of printing parameters and lastly refinement of the tissue maturation process. Finally, we discuss implications that deep learning has on the adoption and regulation of 3D bioprinting. The synergistic interactions among the field of biology, material and deep learning-enabled computational design will eventually facilitate the fabrication of biomimetic patient-specific tissues/organs, making 3D bioprinting of tissues/organs an impending reality.

Alvin Chan*, Yi Tay, Yew-Soon Ong

December 2019 CVPR 2020 Oral Paper (IEEE Conference on Computer Vision and Pattern Recognition)

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

TL;DR: By regularizing for similar input gradients, we can transfer adversarial robustness from a teacher to a student classifier even with different training dataset and model architecture.
Abstract: Adversarial perturbations are imperceptible changes to input pixels that can change the prediction of deep learning models. Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same. Input gradients characterize how small changes at each input pixel affect the model output. Using only natural images, we show here that training a student model’s input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch. Through experiments in MNIST, CIFAR-10, CIFAR-100 and Tiny-ImageNet, we show that our proposed method, input gradient adversarial matching, can transfer robustness across different tasks and even across different model architectures. This demonstrates that directly targeting the semantics of input gradients is a feasible way towards adversarial robustness.

Alvin Chan*, Yew-Soon Ong

November 2019 arXiv:1911.08040 [cs]

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

TL;DR: We propose a comprehensive defense to detect and neutralize backdoor poisoning attacks of different sizes.
Abstract: Deep learning models have recently shown to be vulnerable to backdoor poisoning, an insidious attack where the victim model predicts clean images correctly but classifies the same images as the target class when a trigger poison pattern is added. This poison pattern can be embedded in the training dataset by the adversary. Existing defenses are effective under certain conditions such as a small size of the poison pattern, knowledge about the ratio of poisoned training samples or when a validated clean dataset is available. Since a defender may not have such prior knowledge or resources, we propose a defense against backdoor poisoning that is effective even when those prerequisites are not met. It is made up of several parts: one to extract a backdoor poison signal, detect poison target and base classes, and filter out poisoned from clean samples with proven guarantees. The final part of our defense involves retraining the poisoned model on a dataset augmented with the extracted poison signal and corrective relabeling of poisoned samples to neutralize the backdoor. Our approach has shown to be effective in defending against backdoor attacks that use both small and large-sized poison patterns on nine different target-base class pairs from the CIFAR10 dataset.

Alvin Chan*, Yi Tay, Yew-Soon Ong, Jie Fu

September 2019 ICLR 2020 (International Conference on Learning Representations)

Jacobian Adversarially Regularized Networks for Robustness

TL;DR: We show that training classifiers to produce salient input Jacobian matrices with a GAN-like regularization can boost adversarial robustness.
Abstract: Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks. Against such attacks, adversarial training and its variants stand as the strongest defense to date. Previous studies have pointed out that robust models that have undergone adversarial training tend to produce more salient and interpretable Jacobian matrices than their non-robust counterparts. A natural question is whether a model trained with an objective to produce salient Jacobian can result in better robustness. This paper answers this question with affirmative empirical results. We propose Jacobian Adversarially Regularized Networks (JARN) as a method to optimize the saliency of a classifier’s Jacobian by adversarially regularizing the model’s Jacobian to resemble natural training images. Image classifiers trained with JARN show improved robust accuracy compared to standard models on the MNIST, SVHN and CIFAR-10 datasets, uncovering a new angle to boost robustness without using adversarial training.