Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

ELECTRA: use gelu for pooled output of ELECTRA model#364

Merged
brandenchan merged 1 commit intodeepset-ai:masterfrom
stefan-it:electra-pooled-output-activation-fix
May 14, 2020
Merged

ELECTRA: use gelu for pooled output of ELECTRA model#364
brandenchan merged 1 commit intodeepset-ai:masterfrom
stefan-it:electra-pooled-output-activation-fix

Conversation

@stefan-it
Copy link
Copy Markdown
Contributor

Hi,

this PR fixes the activation function for pooled output from the ELECTRA model, to match the original implementation.

gelu is now used, wheras e.g. BERT uses tanh as activation function. See discussion in #362.

@stefan-it
Copy link
Copy Markdown
Contributor Author

@brandenchan Would be interesting so see the performance diff :)

@brandenchan brandenchan self-requested a review May 14, 2020 15:36
@brandenchan
Copy link
Copy Markdown
Contributor

Great! I tested this on an Electra checkpoint and the performance is still good (in fact gelu got 0.2% averaged over 3 germeval tasks than tanh). Our CI is currently crashing but when I tested the branch locally, all tests passed!

@brandenchan brandenchan merged commit b8c5299 into deepset-ai:master May 14, 2020
@stefan-it stefan-it deleted the electra-pooled-output-activation-fix branch May 14, 2020 18:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants