Superbizarre Is Not Superb

This repository contains the code and data for the ACL paper Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words. The paper shows that a derivational input segmentation helps BERT understand the meaning of complex words, particularly if they did not appear during pretraining.

Dependencies

The code requires Python>=3.6, numpy>=1.18, torch>=1.2, and transformers>=2.5.

Data

The three datasets used in the experiments can be found in data. The datasets contain derivatives with corresponding semantic classes (sentiment and topicality). Please refer to the paper for details about the datasets. The labeling of the datasets is as follows:

Amazon: 0 = negative (e.g., overpriced, crappy), 1 = positive (e.g., megafavorite, applausive)
ArXiv: 0 = physics (e.g., semithermal, ozoneless), 1 = computer science (e.g., autoencoded, rankable)
Reddit: 0 = entertainment (e.g., supervampires, spoilerful), 1 = knowledge (e.g., antirussian, immigrationism)

The datasets are provided as csv files and as segmentation-specific pickled PyTorch datasets that can be easily loaded for model training. The repository also contains the code for generating the different segmentations in src.

Usage

To replicate the hyperparameter search for the learning rate, run the script start_hs.sh in src. To train the models using different segmentations, run the script start_main.sh in src.

Citation

If you use the code or data in this repository, please cite the following paper:

@inproceedings{hofmann2021superbizarre,
    title = {Superbizarre Is Not Superb: Derivational Morphology Improves {BERT}{'}s Interpretation of Complex Words},
    author = {Hofmann, Valentin and Pierrehumbert, Janet and Sch{\"u}tze, Hinrich},
    booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
    year = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Superbizarre Is Not Superb

Dependencies

Data

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

valentinhofmann/superbizarre

Folders and files

Latest commit

History

Repository files navigation

Superbizarre Is Not Superb

Dependencies

Data

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages