Inspiration

Seeing such a surge of internet trolls in the recent years we wanted to protect people how might see some harmful comments on the internet that are hateful, ill-intended, nonconstructive to the general dialogue. That is why we've decided to build this tool.

What it does

The browser extension filters out harmful/toxic text's found online - currently only youtube.com/* is supported but extending it should be easy.

How we built it

The are two important parts to our solution - the browser extension and the classifier. The browser extension was build using standard coding practices for web frontend development, keeping in mind recent API changes for multi-browser support. The classifier, was trained using the Kaggle "Toxic Comment Classification Challenge" link dataset. From there we preprocess the text dataset in javascript and export it to a local file. The local file with the preprocessed classification text's is loaded into a jupyter notebook and a classifier is trained using scikit-learn. Our approach consists of converting the dataset to bag-of-ngrams (bigrams) and training the model on that -- this conversion allows us to deal with varied lengths of comments in datasets as well as detecting "misspelled" toxic words.
For our problem we used AdaBoostClassifier with a DecisionTreeClassifier as a base estimator - as we found it fits the data well.

Once the classifier is trained, we use a transpiler that creates a javascript output of out trained model (with the trained weights and base estimators).

Finally, we load the classifier into the browser extension. Before invoking it we also do a trivial word match for the easiest class of toxic comments (those containing heavy language) -- considering we are running out model in a web browser, this is a welcome optimization.

Challenges we ran into

On the browser extension side of things:

Youtube is a Single page application (SPA) so there were issues determining various URL changes

furthermore, since Youtube is a SPA, loading comments is difficult to do well because they are loaded dynamically once the user scrolls to them (requests them)

Making sure It's cross-browser compatible

On the classifier training side:

Various issues concerning training the model (The training text data varies greatly in length, heavily unbalanced labels, highly unstructured comments written by users, etc...)

Efficiently training the model in javascript as there are is no good software support for dealing with matrices, arrays, etc.

Accomplishments that we're proud of

We've finished it with a quite well performance! .. 2 guys in 24 hours made a ML model trained on ~150k data that runs in a browser :D

What we learned

We've never before written an extension for a browser, considering this was our first time it was a bit of a hassle :)

We can train a model in a browser extension

Tons of great javascript libraries like compose and browserify

What's next for detoxify

The research community would benefit greatly if a user of our extension could submit feedback about classification (if a text was wrongly classified, or similar). That would allow for continuous creation of label data which could in turn be used to improve the extensions models.

It would be great to see this project come alive as an open source project. A community driven toxic comment classifier where anyone can know why and how was something classified as toxic or non-toxic. Nowadays, freedom of speech has been exploited by the anonymity that internet provides -- thankfully we have the tools to combat that.

Built With

Share this project:

Updates