Skip to content

Add Olmo2ForSequenceClassification (and ideally OlmoForSequenceClassification / Olmo3ForSequenceClassification) #45529

@earino

Description

@earino

AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B") currently fails because the OLMo family exposes only *Model and *ForCausalLM. All peer decoder architectures (Llama, Mistral, Qwen2, Gemma, Falcon, etc.) ship ForSequenceClassification.

Motivation

I teach the graduate Applied Deep Learning course at Central European University (ECBS5200, http://earino.github.io/applied-deep-learning). The course runs a six-week project fine-tuning models for a 113-class text classification task (consumer financial complaints) on free-tier Kaggle T4 GPUs, with an explicit module comparing encoder vs decoder architectures and a capstone on model economics.

I insist on fully-open models (open weights + open training data + open training code) so students can inspect and reproduce the full pipeline. OLMo-2 1B is the only small-decoder option in that category that fits a T4, but AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B") currently fails — which blocks the decoder-comparison module from using the canonical HF classification API.

Adding this head would let a concrete graduate cohort learn decoder fine-tuning on a fully-transparent pipeline using the same idiomatic API they use for the encoder baseline.

Proposed approach

Add Olmo2ForSequenceClassification in modular_olmo2.py as a subclass of LlamaForSequenceClassification re-pointed at Olmo2Model, let make fix-repo regenerate modeling_olmo2.py, register in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES, and extend tests/models/olmo2/test_modeling_olmo2.py with the standard ModelTesterMixin classification tests. End-to-end verified on real data on a GPU before PR.

Questions before I start

  1. Welcome in principle, or was the omission intentional?
  2. Scope — OLMo-2 only, or should I include OLMo and OLMo-3 in the same PR for consistency?
  3. Any preferences on test coverage beyond the standard ModelTesterMixin?

Will disclose AI assistance and link this thread in the PR per the CLAUDE.md agentic contribution policy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions