Skip to content

SasanoLab/semantic-frame-induction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FrameEOL: Semantic Frame Induction using Causal Language Models

arXiv EMNLP 2025

📋 Overview

This is the implementation of the experiments conducted in the FrameEOL paper.

FrameEOL is an embedding acquisition method that uses Causal Language Models (CLM) to perform semantic frame induction. Decoder Architecture

This implementation includes code for semantic frame induction using CLM with FrameEOL, code for training the CLM to further improve performance, and code for semantic frame induction and training using Masked Language Models as a comparison.

🚀 Setup

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone Repository and Install Dependencies

git clone <repository-url>
cd semantic-frame-induction
uv sync

Requirements

  • Python 3.10-3.11
  • CUDA-compatible GPU
  • Key dependencies: PyTorch, Transformers, scikit-learn, FAISS

📊 Data Preparation

Please contact each source directly to obtain the original datasets. (We only provide preprocessing code.)

Preprocessing FrameNet Dataset (English)

bash script/preprocess_framenet.sh

Preprocessing Japanese FrameNet Dataset (Japanese)

bash script/preprocess_ja-framenet.sh

The data will be saved in the data/framenet/ and data/ja-framenet/ directories.

🎓 Training

Encoder Training

bash script/train_encoder.sh
uv run python src/eval/agg_encoder.py

Decoder Training

bash script/train_decoder.sh
uv run python src/eval/agg_decoder.py

In-Context Learning

bash script/icl_decoder.sh

📁 Project Structure

.
├── src/
│   ├── dataset/           # Dataset preprocessing scripts
│   ├── eval/              # Evaluation scripts
│   ├── utils/             # Utilities (clustering, triplet learning, etc.)
│   ├── train_encoder.py   # Encoder training
│   ├── train_decoder.py   # Decoder training
│   └── few_shot.py        # Few-shot learning
├── script/                # Execution scripts
├── data/                  # Dataset storage
│   ├── framenet/          # FrameNet (English)
│   └── ja-framenet/       # Japanese FrameNet
├── outputs/               # Trained models and evaluation results
└── pyproject.toml         # Project configuration

📝 Citation

If you use this implementation, please cite the following paper (WIP):

@inproceedings{yano-etal-2025-frameeol,
    title = "{F}rame{EOL}: Semantic Frame Induction using Causal Language Models",
    author = "Yano, Chihiro and Yamada, Kosuke and Tsukagoshi, Hayato and Sasano, Ryohei and Takeda, Koichi",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    year = "2025",
    publisher = "Association for Computational Linguistics",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors