Skip to content

Comments

refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API#11091

Closed
shreedharbhat98 wants to merge 0 commit intofacebook:mainfrom
shreedharbhat98:refactor-reading-time
Closed

refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API#11091
shreedharbhat98 wants to merge 0 commit intofacebook:mainfrom
shreedharbhat98:refactor-reading-time

Conversation

@shreedharbhat98
Copy link
Contributor

@shreedharbhat98 shreedharbhat98 commented Apr 12, 2025

Pre-flight checklist

Motivation

Test Plan

Test links

Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/

Related issues/PRs

#11086

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Apr 12, 2025
@shreedharbhat98 shreedharbhat98 changed the title refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API Apr 12, 2025
@netlify
Copy link

netlify bot commented Apr 12, 2025

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit fadb6d2
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/681210976982a7000704bb1c
😎 Deploy Preview https://deploy-preview-11091--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions
Copy link

github-actions bot commented Apr 12, 2025

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO Report
/ 🟠 63 🟢 98 🟢 100 🟢 100 Report
/docs/installation 🟠 50 🟢 97 🟢 100 🟢 100 Report
/docs/category/getting-started 🟠 72 🟢 100 🟢 100 🟠 86 Report
/blog 🟠 61 🟢 96 🟢 100 🟠 86 Report
/blog/preparing-your-site-for-docusaurus-v3 🔴 45 🟢 92 🟢 100 🟢 100 Report
/blog/tags/release 🟠 63 🟢 96 🟢 100 🟠 86 Report
/blog/tags 🟠 72 🟢 100 🟢 100 🟠 86 Report

@shreedharbhat98
Copy link
Contributor Author

Hi @slorber,

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation.
Could you please advise on how you’d like to proceed in light of these differences?

Really appreciate your guidance—thank you!

@shreedharbhat98
Copy link
Contributor Author

@Josh-Cena & @slorber quick reminder

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

  • first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
  • second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Comment on lines 30 to 38
const segmenter = new Intl.Segmenter(locale, {granularity: 'word'});
const segments = segmenter.segment(contentWithoutFrontmatter);

let wordCount = 0;
for (const segment of segments) {
if (segment.isWordLike) {
wordCount += 1;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract this as a "countWords" function that we can unit test independently?

Comment on lines 12 to 17
interface ReadingTimeResult {
text: string;
minutes: number;
time: number;
words: number;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the number of minutes as an output

*/
interface ReadingTimeOptions {
wordsPerMinute?: number;
locale?: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locale should always be provided, a Docusaurus site always has one

): ReadingTimeResult {
const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE;
const locale = options.locale ?? DEFAULT_LOCALE;
const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, '');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have that before so I'd prefer to not do that.

The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context

@shreedharbhat98
Copy link
Contributor Author

As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.

Thanks 👍

While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?

Can you make it so that we can easily see those differences between the review?

An idea would be to split this PR in 2:

  • first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
  • second PR makes it easy to see the tests being different with the new implementation

I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.

👋

Thanks for the suggestions, @slorber. I will work on them.

@Josh-Cena
Copy link
Collaborator

For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Signed Facebook CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace reading-time npm package by Intl.Segmenter API

4 participants