Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Add documentation about Hugging Face integration#1335

Closed
sheonhan wants to merge 1 commit intofacebookresearch:mainfrom
sheonhan:add-hugging-face-integration
Closed

Add documentation about Hugging Face integration#1335
sheonhan wants to merge 1 commit intofacebookresearch:mainfrom
sheonhan:add-hugging-face-integration

Conversation

@sheonhan
Copy link
Copy Markdown
Contributor

@sheonhan sheonhan commented Jun 3, 2023

Word vectors for 157 languages are now hosted on the Hugging Face Hub as well as the language identification model. (cc @ajoulin)

A newer language model referred in the NLLB project is not mentioned in the official website, so I updated the doc accordingly.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @sheonhan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@sheonhan
Copy link
Copy Markdown
Contributor Author

sheonhan commented Jun 5, 2023

Hi @dmitryvinn I was hoping to get this documentation update merged (I've spoken with Juan Pino and @ajoulin as part of this project) to officially announce fastText's integration on the Hugging Face Hub. Let me know if you need anything on my end or if there's anyone else I should ping!

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@jmp84 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@jmp84 merged this pull request in 48171ac.

@sheonhan sheonhan deleted the add-hugging-face-integration branch June 7, 2023 04:52
@sheonhan
Copy link
Copy Markdown
Contributor Author

sheonhan commented Jun 7, 2023

Hi @jmp84, thanks for merging the PR. It doesn't seem like this change has been reflected in prod. Could you help us with that?

@Celebio
Copy link
Copy Markdown
Member

Celebio commented Jun 13, 2023

hi @sheonhan ,
could you clarify what you mean by "prod"? It looks like the changes are reflected in the main branch: 48171ac

Best regards,
Onur

@sheonhan
Copy link
Copy Markdown
Contributor Author

Hi @Celebio, yes it looks like the changes aren't reflected on the website. https://fasttext.cc/docs/en/crawl-vectors.html

Thanks for taking a look!

sburman added a commit to sageailabs/fastText that referenced this pull request May 22, 2024
* Replace outdated url in the scripts

Summary: Replace outdated url in the scripts

Reviewed By: piotr-bojanowski

Differential Revision: D43464784

fbshipit-source-id: 51a98a9ad5a0939acd0d578126290909a613938b

* Add documentation about Hugging Face integration (facebookresearch#1335)

Summary:
[Word vectors](https://huggingface.co/facebook/fasttext-en-vectors) for 157 languages are now hosted on the Hugging Face Hub as well as the [language identification model](https://huggingface.co/facebook/fasttext-language-identification). (cc ajoulin)

A newer language model [referred in the NLLB project](https://github.com/facebookresearch/fairseq/blob/nllb/README.md#lid-model) is not mentioned in the official website, so I updated the doc accordingly.

Pull Request resolved: facebookresearch#1335

Reviewed By: Celebio

Differential Revision: D46507563

Pulled By: jmp84

fbshipit-source-id: 64883a6829c68b968acd980ba77a712b8e7a1365

* Migrate "deeplearning/fastText" from LLVM-12 to LLVM-15

Summary:
fbcode is migrating to LLVM-15 for safer and more up-to-date code and new compiler features. All contbuilds in your directory have passed our build test with LLVM-15, and your directory does not host any packages. This diff will migrate it to LLVM-15.

If you approve of this diff, please use the "Accept & Ship" button. If you have a reason for why it should not build with LLVM 15, please make a comment and send it back to author. Otherwise we will land this on Thursday 06/15/2023.

See the [FAQ post](https://fb.workplace.com/groups/llvm15platform010/posts/749154386769776/)! Please also direct any questions to [this group](https://fb.workplace.com/groups/llvm15platform010).

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Reviewed By: meyering

Differential Revision: D46661531

fbshipit-source-id: 7278fbfcadec2392c94efd6deb710bdd5e9280f8

* Del `(object)` from 200 inc deeplearning/aicamera/trainer/utils/metrics.py

Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this.

Reviewed By: itamaro

Differential Revision: D48673901

fbshipit-source-id: 3e0ef05efe886b32a07bb58bd0725fa2ec934c14

* deeplearning, dcp (2972240286315620591)

Reviewed By: r-barnes

Differential Revision: D49677606

fbshipit-source-id: ec5b375177586c76ecccb83a29b562bc6e9961f6

* Add pyproject.toml to comply with PEP-518 (facebookresearch#1292)

Summary:
Adds pyproject.toml to comply with PEP-518, which fixes the building of the library by poetry - See python-poetry/poetry#6113 . This is a copy of facebookresearch#1270 , but I have signed the CLA.

Pull Request resolved: facebookresearch#1292

Differential Revision: D51601444

Pulled By: alexkosau

fbshipit-source-id: 357d702281ca3519c3640483eba04d124d0744b4

* fix compile error with gcc13 facebookresearch#1281 (facebookresearch#1340)

Summary:
Due to[ header dependency changes](https://gcc.gnu.org/gcc-13/porting_to.html#header-dep-changes) in GCC 13, we need to include the <cstdint> header.

Pull Request resolved: facebookresearch#1340

Reviewed By: jmp84

Differential Revision: D51602433

Pulled By: alexkosau

fbshipit-source-id: cc9bffb276cb00f1db8ec97a36784c484ae4563a

* Predict 1.9-4.2x faster (facebookresearch#1341)

Summary:
I made prediction 1.9x to 4.2x faster than before.

# Motivation
I want to use https://tinyurl.com/nllblid218e and similarly parametrized models to run language classification on petabytes of web data.

# Methodology
The costliest operation is summing the rows for each model input.  I've optimized this in three ways:
1. `addRowToVector` was a virtual function call for each row.  I've replaced this with one virtual function call per prediction by adding `averageRowsToVector` to `Matrix` calls.
2. `Vector` and `DenseMatrix` were not 64-byte aligned so the CPU was doing a lot of unaligned memory access.  I've brought in my own `vector` replacement that does 64-byte alignment.
3.  Write the `averageRowsToVector` in intrinsics for common vector sizes.  This works on SSE, AVX, and AVX512F.

See the commit history for a breakdown of speed improvement from each change.

# Experiments
Test set [docs1000.txt.gz](https://github.com/facebookresearch/fastText/files/11832996/docs1000.txt.gz) which is a bunch of random documents https://data.statmt.org/heafield/classified-fasttext/
CPU: AMD Ryzen 9 7950X 16-Core

Model https://tinyurl.com/nllblid218e with 256-dimensional vectors
Before
real    0m8.757s
user    0m8.434s
sys     0m0.327s

After
real    0m2.046s
user    0m1.717s
sys     0m0.334s

Model https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin with 16-dimensional vectors
Before
real    0m0.926s
user    0m0.889s
sys     0m0.037s

After
real    0m0.477s
user    0m0.436s
sys     0m0.040s

Pull Request resolved: facebookresearch#1341

Reviewed By: graemenail

Differential Revision: D52134736

Pulled By: kpuatfb

fbshipit-source-id: 42067161f4c968c34612934b48a562399a267f3b

* deeplearning/fastText 2/2

Reviewed By: azad-meta

Differential Revision: D53908330

fbshipit-source-id: b2215f0522c32a82cd876633210befefe9317d76

* Delete .circleci directory (facebookresearch#1366)

Summary: Pull Request resolved: facebookresearch#1366

Reviewed By: jailby

Differential Revision: D54850920

Pulled By: bigfootjon

fbshipit-source-id: 9a3eec7b7cb42335a786fb247cb16be9ed3c2d59

* this page intentionally left blank

---------

Co-authored-by: Onur Çelebi <[email protected]>
Co-authored-by: Sheon Han <[email protected]>
Co-authored-by: generatedunixname89002005320047 <[email protected]>
Co-authored-by: Richard Barnes <[email protected]>
Co-authored-by: generatedunixname89002005287564 <[email protected]>
Co-authored-by: Chris Culhane <[email protected]>
Co-authored-by: Cherilyn Buren <[email protected]>
Co-authored-by: Kenneth Heafield <[email protected]>
Co-authored-by: Jon Janzen <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants