Skip to content

XWPF: fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen to "‑")#670

Closed
fangd1997 wants to merge 5 commits intoapache:trunkfrom
fangd1997:codespace-probable-space-chainsaw-4v6j9wvq7rvf5qqp
Closed

XWPF: fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen to "‑")#670
fangd1997 wants to merge 5 commits intoapache:trunkfrom
fangd1997:codespace-probable-space-chainsaw-4v6j9wvq7rvf5qqp

Conversation

@fangd1997
Copy link
Copy Markdown

@fangd1997 fangd1997 commented Aug 12, 2024

  1. w:delInstrText also need to be ignored
  2. w:noBreakHyphen needs to be converted to "‑"
    @pjfanning Please help review the code

@fangd1997 fangd1997 changed the title fix _getText in XWPFRun fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen) Aug 12, 2024
@fangd1997 fangd1997 changed the title fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen) XWPF: fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen) Aug 12, 2024
@fangd1997 fangd1997 changed the title XWPF: fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen) XWPF: fix _getText in XWPFRun(ignore w:delInstrText, convert w:noBreakHyphen to "‑") Aug 12, 2024
@pjfanning
Copy link
Copy Markdown
Member

It is my understanding that noBreakHyphens are displayed optionally - if, for instance, you display a paragraph in a narrow space. Most people extracting this text will have no interest in seeing the hyphen. For me, this behaviour would have to be optional and default to the existing behaviour.

For delInstrText, please provide a sample docx file that can be used to test with.

@fangd1997
Copy link
Copy Markdown
Author

fangd1997 commented Aug 13, 2024

The Sample File: TestNoBreakHyphenAndDelInstrText.docx
image
the result in test:
image
In this example, the "-" in the first Figure 1-1 is noBreakHypen, and I think it is necessary to display it.
FYI: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.nobreakhyphen?view=openxml-3.0.1
The second Figure 1-1 has delInstrText, just like InstrText, it shouldn't actually be displayed

@asfgit asfgit closed this in 33260d5 Aug 14, 2024
@pjfanning
Copy link
Copy Markdown
Member

Thanks - merged

alexjansons pushed a commit to alexjansons/poi that referenced this pull request Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants