ICLR'24 Conference Memo
For details of each chapter, please refer to the subpage;
Overview
1. Sparse Attention and KV Cache Compression
(window attention?) LONGLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS
(attention sink) Efficient Streaming Language Models with Attention Sinks
2. Continual Learning
3. Hyperparameter Optimization (HPO)
4. Continuous Shifts
Latent Trajectory Learning for Limited Timestamps under Distribution Shift over Time
1. Param Optimization
👍👍 [Teleportation] Improving Convergence and Generalization Using Parameter Symmetries
2. OOD Generalization
3. Augmentation
👍👍 (Generalization bound & augmentation complexity?) Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
4. Data Selection
Improved Active Learning via Dependent Leverage Score Sampling (score sampling)
ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentationsegmentation uncertainty for OOD/Active etc); code
5. Pre-training
Last updated