Skip to content

feat: Add header-based splitting to MarkdownTextSplitter#4861

Merged
HenryHengZJ merged 3 commits intoFlowiseAI:mainfrom
Amrrx:feature/markdown-header-splitting
Jul 18, 2025
Merged

feat: Add header-based splitting to MarkdownTextSplitter#4861
HenryHengZJ merged 3 commits intoFlowiseAI:mainfrom
Amrrx:feature/markdown-header-splitting

Conversation

@Amrrx
Copy link
Copy Markdown
Contributor

@Amrrx Amrrx commented Jul 13, 2025

Description:

Summary

Adds configurable header-based splitting to MarkdownTextSplitter component for semantic document
chunking by writing a custom method since its not a native option in langchain.js.

Features

  • Dropdown selection for header levels (H1-H6)
  • Hierarchical splitting (H2 includes H1 headers)
  • Headers preserved with content sections
  • Prioritizes semantic boundaries over chunk size

Testing

✅ Tested with 23KB real-world markdown document
✅ All splitting scenarios working correctly
✅ Production build successful

Results

  • H1: 5 chunks (4,568 chars avg)
  • H2: 21 chunks (1,086 chars avg)
  • H3: 69 chunks (329 chars avg)

  - Add dropdown for header level selection (H1-H6)
  - Implement hierarchical splitting (H2 includes H1 headers)
  - Headers preserved with content sections
  - Prioritize semantic boundaries over chunk size
Copy link
Copy Markdown
Contributor

@HenryHengZJ HenryHengZJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@HenryHengZJ HenryHengZJ merged commit d584c0b into FlowiseAI:main Jul 18, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants