ICLR'24 Conference Memo

For details of each chapter, please refer to the subpage;

Overview

1. Sparse Attention and KV Cache Compression

2. Continual Learning

3. Hyperparameter Optimization (HPO)

4. Continuous Shifts

  • Latent Trajectory Learning for Limited Timestamps under Distribution Shift over Time

1. Param Optimization

2. OOD Generalization

3. Augmentation

4. Data Selection

5. Pre-training

Last updated