SAIL@Princeton

Systems for Artificial Intelligence Lab @ Princeton

About Us

SAIL@Princeton is an interdisciplinary research group focused on developing next-generation AI/ML systems that are efficient, accurate, reliable, scalable, and secure. As a collaboration between both systems and machine learning researchers, our work considers all parts of the compute stack, spanning the design of AI/ML algorithms and pipelines to runtime systems for training/inference to intelligent applications of ML to tackle core systems challenges.

Directions

Efficient ML Inference

High-throughput, low-latency, and low-cost inference systems and techniques for ML models (traditional DNNs and modern LLMs).

Efficient ML Training

Efficient, high-throughput, fair, and cost-effective training and fine-tuning systems for ML models (traditional DNNs and modern LLMs).

Compound AI Systems

Designing and optimizing systems that tackle AI tasks by combining multiple interacting components, including retrievers (Retrieval Augmented Generation), multiple calls to models, or external tools (Agents).

ML for Systems

Applying ML (reinforcement learning, Bayesian optimizations, LLMs, etc.) to key systems decisions like caching, resource management, anomaly detection, and network protocol design.

Novel Model Architectures

Full-stack innovations on State Space Models (e.g., Mamba), and more generally, models with linear Attention/RNN layers.

Edge AI Systems

Video analytics, video conferencing, and video streaming.

Emerging Paradigms

Multimodal models, diffusion models, MoEs, inference-time scaling, etc.

Hardware Design for ML

Tools for exploring the hardware design space, silicon cost aware hardware design for ML workloads, hardware for serving large models.

LLM Safety

Building systems and datasets for improving LLM robustness, misuse prevention, and transparency.

Sequence Modeling

Developing model architectures for sequence modeling and optimizing the processing of long context windows.

Intelligent Resource Management

Scheduling, multi tenant GPU sharing, and learned cache eviction for ML workloads.

Security and Privacy

Privacy preservation for ML to prevent leaking and misusing sensitive information, efficient ML models for detecting threats.

Data Systems

Leveraging ML and optimization techniques to build novel designs for data storage layouts, database indexes, and end-to-end data systems.

@ 2025 SAIL@Princeton. Powered by Bootstrap.