refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API#11091
refactor(docusaurus-plugin-content-blog): Replace reading-time npm with Intl.Segmenter API#11091shreedharbhat98 wants to merge 0 commit intofacebook:mainfrom
reading-time npm with Intl.Segmenter API#11091Conversation
reading-time npm with Intl.Segmenterreading-time npm with Intl.Segmenter API
✅ [V2]Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
⚡️ Lighthouse report for the deploy preview of this PR
|
|
Hi @slorber, As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API. While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Really appreciate your guidance—thank you! |
|
@Josh-Cena & @slorber quick reminder |
slorber
left a comment
There was a problem hiding this comment.
As per your suggestion, I’ve replaced the reading-time package with the native Intl.Segmenter API.
Thanks 👍
While implementing this, I also wrote unit tests to compare both approaches. However, I’m noticing a few discrepancies in the results. It seems these differences are likely due to the fact that reading-time uses a basic word-counting algorithm, whereas Intl.Segmenter might have more nuanced rules for segmentation. Could you please advise on how you’d like to proceed in light of these differences?
Can you make it so that we can easily see those differences between the review?
An idea would be to split this PR in 2:
- first PR only writes unit tests for the original package, and refactor a bit the code (exact same behavior, so easy to review and merge for me)
- second PR makes it easy to see the tests being different with the new implementation
I'll be unavailable in the next days so I'll only be able to review/merge later in 2 weeks.
👋
| const segmenter = new Intl.Segmenter(locale, {granularity: 'word'}); | ||
| const segments = segmenter.segment(contentWithoutFrontmatter); | ||
|
|
||
| let wordCount = 0; | ||
| for (const segment of segments) { | ||
| if (segment.isWordLike) { | ||
| wordCount += 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Could you extract this as a "countWords" function that we can unit test independently?
| interface ReadingTimeResult { | ||
| text: string; | ||
| minutes: number; | ||
| time: number; | ||
| words: number; | ||
| } |
There was a problem hiding this comment.
We only need the number of minutes as an output
| */ | ||
| interface ReadingTimeOptions { | ||
| wordsPerMinute?: number; | ||
| locale?: string; |
There was a problem hiding this comment.
The locale should always be provided, a Docusaurus site always has one
| ): ReadingTimeResult { | ||
| const wordsPerMinute = options.wordsPerMinute ?? DEFAULT_WORDS_PER_MINUTE; | ||
| const locale = options.locale ?? DEFAULT_LOCALE; | ||
| const contentWithoutFrontmatter = content.replace(/^---[\s\S]*?---\n/, ''); |
There was a problem hiding this comment.
We didn't have that before so I'd prefer to not do that.
The called should be responsible from providing text content, and this function shouldn't assume it's called in a markdown/mdx context
Thanks for the suggestions, @slorber. I will work on them. |
|
For some context: I'm co-maintaining reading-time, and it's indeed extremely unclear what the value the project offers with Intl.Segmenter. One major difference is that reading-time splits CJK languages by characters instead of words, so you may get a smaller reading time estimate when using Intl.Segmenter, which arguably is more correct. So I'm +1 on this change. |
be88a80 to
fadb6d2
Compare
Pre-flight checklist
reading-timenpm package byIntl.SegmenterAPI #11086) and the maintainers have approved on my working plan.Motivation
Test Plan
Test links
Deploy preview: https://deploy-preview-_____--docusaurus-2.netlify.app/
Related issues/PRs
#11086