AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B") currently fails because the OLMo family exposes only *Model and *ForCausalLM. All peer decoder architectures (Llama, Mistral, Qwen2, Gemma, Falcon, etc.) ship ForSequenceClassification.
Motivation
I teach the graduate Applied Deep Learning course at Central European University (ECBS5200, http://earino.github.io/applied-deep-learning). The course runs a six-week project fine-tuning models for a 113-class text classification task (consumer financial complaints) on free-tier Kaggle T4 GPUs, with an explicit module comparing encoder vs decoder architectures and a capstone on model economics.
I insist on fully-open models (open weights + open training data + open training code) so students can inspect and reproduce the full pipeline. OLMo-2 1B is the only small-decoder option in that category that fits a T4, but AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B") currently fails — which blocks the decoder-comparison module from using the canonical HF classification API.
Adding this head would let a concrete graduate cohort learn decoder fine-tuning on a fully-transparent pipeline using the same idiomatic API they use for the encoder baseline.
Proposed approach
Add Olmo2ForSequenceClassification in modular_olmo2.py as a subclass of LlamaForSequenceClassification re-pointed at Olmo2Model, let make fix-repo regenerate modeling_olmo2.py, register in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES, and extend tests/models/olmo2/test_modeling_olmo2.py with the standard ModelTesterMixin classification tests. End-to-end verified on real data on a GPU before PR.
Questions before I start
- Welcome in principle, or was the omission intentional?
- Scope — OLMo-2 only, or should I include OLMo and OLMo-3 in the same PR for consistency?
- Any preferences on test coverage beyond the standard
ModelTesterMixin?
Will disclose AI assistance and link this thread in the PR per the CLAUDE.md agentic contribution policy.
AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")currently fails because the OLMo family exposes only*Modeland*ForCausalLM. All peer decoder architectures (Llama, Mistral, Qwen2, Gemma, Falcon, etc.) shipForSequenceClassification.Motivation
I teach the graduate Applied Deep Learning course at Central European University (ECBS5200, http://earino.github.io/applied-deep-learning). The course runs a six-week project fine-tuning models for a 113-class text classification task (consumer financial complaints) on free-tier Kaggle T4 GPUs, with an explicit module comparing encoder vs decoder architectures and a capstone on model economics.
I insist on fully-open models (open weights + open training data + open training code) so students can inspect and reproduce the full pipeline. OLMo-2 1B is the only small-decoder option in that category that fits a T4, but
AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")currently fails — which blocks the decoder-comparison module from using the canonical HF classification API.Adding this head would let a concrete graduate cohort learn decoder fine-tuning on a fully-transparent pipeline using the same idiomatic API they use for the encoder baseline.
Proposed approach
Add
Olmo2ForSequenceClassificationinmodular_olmo2.pyas a subclass ofLlamaForSequenceClassificationre-pointed atOlmo2Model, letmake fix-reporegeneratemodeling_olmo2.py, register inMODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES, and extendtests/models/olmo2/test_modeling_olmo2.pywith the standardModelTesterMixinclassification tests. End-to-end verified on real data on a GPU before PR.Questions before I start
ModelTesterMixin?Will disclose AI assistance and link this thread in the PR per the
CLAUDE.mdagentic contribution policy.