refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API by shreedharbhat98 · Pull Request #11091 · facebook/docusaurus

shreedharbhat98 · 2025-04-12T15:20:03Z

Pre-flight checklist

I have read the Contributing Guidelines on pull requests.
If this is a code change: I have written unit tests and/or added dogfooding pages to fully verify the new behavior.
If this is a new API or substantial change: the PR has an accompanying issue (closes Replace reading-time npm package by Intl.Segmenter API #11086) and the maintainers have approved on my working plan.

Motivation

Test Plan

Test links

Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/

Related issues/PRs

#11086

netlify · 2025-04-12T15:22:32Z

✅ [V2]

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`fadb6d2`
🔍 Latest deploy log	https://app.netlify.com/sites/docusaurus-2/deploys/681210976982a7000704bb1c
😎 Deploy Preview	https://deploy-preview-11091--docusaurus-2.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

github-actions · 2025-04-12T15:25:34Z

⚡️ Lighthouse report for the deploy preview of this PR

URL	Performance	Accessibility	Best Practices	SEO	Report
/	🟠 63	🟢 98	🟢 100	🟢 100	Report
/docs/installation	🟠 50	🟢 97	🟢 100	🟢 100	Report
/docs/category/getting-started	🟠 72	🟢 100	🟢 100	🟠 86	Report
/blog	🟠 61	🟢 96	🟢 100	🟠 86	Report
/blog/preparing-your-site-for-docusaurus-v3	🔴 45	🟢 92	🟢 100	🟢 100	Report
/blog/tags/release	🟠 63	🟢 96	🟢 100	🟠 86	Report
/blog/tags	🟠 72	🟢 100	🟢 100	🟠 86	Report

shreedharbhat98 · 2025-04-12T15:29:11Z

Hi @slorber,

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation.
Could you please advise on how you’d like to proceed in light of these differences?

Really appreciate your guidance—thank you!

shreedharbhat98 · 2025-04-16T13:16:41Z

@Josh-Cena & @slorber quick reminder

slorber

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

slorber · 2025-04-18T14:04:04Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+  const segmenter = new Intl.Segmenter(locale, {granularity: 'word'});
+  const segments = segmenter.segment(contentWithoutFrontmatter);
+
+  let wordCount = 0;
+  for (const segment of segments) {
+    if (segment.isWordLike) {
+      wordCount += 1;
+    }
+  }


Could you extract this as a "countWords" function that we can unit test independently?

slorber · 2025-04-18T14:06:14Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+interface ReadingTimeResult {
+  text: string;
+  minutes: number;
+  time: number;
+  words: number;
+}


We only need the number of minutes as an output

slorber · 2025-04-18T14:06:34Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+ */
+interface ReadingTimeOptions {
+  wordsPerMinute?: number;
+  locale?: string;


The locale should always be provided, a Docusaurus site always has one

slorber · 2025-04-18T14:08:08Z

packages/docusaurus-plugin-content-blog/src/readingTime.ts

+): ReadingTimeResult {
+  const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE;
+  const locale = options.locale ?? DEFAULT_LOCALE;
+  const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, '');


We didn't have that before so I'd prefer to not do that.

The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context

shreedharbhat98 · 2025-04-18T14:24:27Z

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)

second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Thanks for the suggestions, @slorber. I will work on them.

Josh-Cena · 2025-04-18T16:41:58Z

For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change.

shreedharbhat98 requested review from Josh-Cena and slorber as code owners April 12, 2025 15:20

facebook-github-bot added the CLA Signed Signed Facebook CLA label Apr 12, 2025

shreedharbhat98 changed the title ~~refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter~~ refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API Apr 12, 2025

shreedharbhat98 mentioned this pull request Apr 12, 2025

Replace reading-time npm package by Intl.Segmenter API #11086

Closed

2 tasks

slorber requested changes Apr 18, 2025

View reviewed changes

shreedharbhat98 mentioned this pull request Apr 21, 2025

test(blog): Add unit tests for calculating blog posts reading time #11116

Merged

3 tasks

shreedharbhat98 closed this Apr 30, 2025

shreedharbhat98 force-pushed the refactor-reading-time branch from be88a80 to fadb6d2 Compare April 30, 2025 11:59

shreedharbhat98 mentioned this pull request Apr 30, 2025

refactor(content-blog): replace reading-time with Intl.Segmenter API #11138

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API#11091

refactor(docusaurus-plugin-content-blog): Replace `reading-time` npm with `Intl.Segmenter` API#11091
shreedharbhat98 wants to merge 0 commit intofacebook:mainfrom
shreedharbhat98:refactor-reading-time

shreedharbhat98 commented Apr 12, 2025 •

edited

Loading

Uh oh!

netlify bot commented Apr 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 12, 2025 •

edited

Loading

Uh oh!

shreedharbhat98 commented Apr 12, 2025

Uh oh!

shreedharbhat98 commented Apr 16, 2025

Uh oh!

slorber left a comment

Uh oh!

slorber Apr 18, 2025

Uh oh!

slorber Apr 18, 2025

Uh oh!

slorber Apr 18, 2025

Uh oh!

slorber Apr 18, 2025

Uh oh!

shreedharbhat98 commented Apr 18, 2025

Uh oh!

Josh-Cena commented Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Comments

Conversation

shreedharbhat98 commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-flight checklist

Motivation

Test Plan

Test links

Related issues/PRs

Uh oh!

netlify bot commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ [V2]

Uh oh!

github-actions bot commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡️ Lighthouse report for the deploy preview of this PR

Uh oh!

shreedharbhat98 commented Apr 12, 2025

Uh oh!

shreedharbhat98 commented Apr 16, 2025

Uh oh!

slorber left a comment

Choose a reason for hiding this comment

Uh oh!

slorber Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

slorber Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

slorber Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

slorber Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

shreedharbhat98 commented Apr 18, 2025

Uh oh!

Josh-Cena commented Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shreedharbhat98 commented Apr 12, 2025 •

edited

Loading

netlify bot commented Apr 12, 2025 •

edited

Loading

github-actions bot commented Apr 12, 2025 •

edited

Loading