Add `Olmo2ForSequenceClassification` (and ideally `OlmoForSequenceClassification` / `Olmo3ForSequenceClassification`)

`AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")` currently fails because the OLMo family exposes only `*Model` and `*ForCausalLM`. All peer decoder architectures (Llama, Mistral, Qwen2, Gemma, Falcon, etc.) ship `ForSequenceClassification`.

## Motivation

I teach the graduate Applied Deep Learning course at Central European University (ECBS5200, http://earino.github.io/applied-deep-learning). The course runs a six-week project fine-tuning models for a 113-class text classification task (consumer financial complaints) on free-tier Kaggle T4 GPUs, with an explicit module comparing encoder vs decoder architectures and a capstone on model economics.

I insist on fully-open models (open weights + open training data + open training code) so students can inspect and reproduce the full pipeline. OLMo-2 1B is the only small-decoder option in that category that fits a T4, but `AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")` currently fails — which blocks the decoder-comparison module from using the canonical HF classification API.

Adding this head would let a concrete graduate cohort learn decoder fine-tuning on a fully-transparent pipeline using the same idiomatic API they use for the encoder baseline.

## Proposed approach

Add `Olmo2ForSequenceClassification` in `modular_olmo2.py` as a subclass of `LlamaForSequenceClassification` re-pointed at `Olmo2Model`, let `make fix-repo` regenerate `modeling_olmo2.py`, register in `MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES`, and extend `tests/models/olmo2/test_modeling_olmo2.py` with the standard `ModelTesterMixin` classification tests. End-to-end verified on real data on a GPU before PR.

## Questions before I start

1. Welcome in principle, or was the omission intentional?
2. Scope — OLMo-2 only, or should I include OLMo and OLMo-3 in the same PR for consistency?
3. Any preferences on test coverage beyond the standard `ModelTesterMixin`?

Will disclose AI assistance and link this thread in the PR per the `CLAUDE.md` agentic contribution policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Olmo2ForSequenceClassification` (and ideally `OlmoForSequenceClassification` / `Olmo3ForSequenceClassification`) #45529

Motivation

Proposed approach

Questions before I start

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Olmo2ForSequenceClassification (and ideally OlmoForSequenceClassification / Olmo3ForSequenceClassification) #45529

Description

Motivation

Proposed approach

Questions before I start

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Add `Olmo2ForSequenceClassification` (and ideally `OlmoForSequenceClassification` / `Olmo3ForSequenceClassification`) #45529