-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Description
This is the issue tracking the great buffer rewrite of 202x.
Aims
- Refactor to remove the need for
UnicodeStorage(which is a lookup table keyed on row+column)- Removing this allows us to remove
ROW::_id,ROW::_pParent,CharRow::_pParent
- Removing this allows us to remove
- Reduce the fiddliness of the DBCS attribute APIs
- DBCS attributes are stored for every character when they could be easily inferred from column position
- Add support for the storage of surrogate pairs
- Surrogate pairs work today as an accident of fate: a pair of UTF-16 code units encoding a EA=wide codepoint is seen as wide, which conveniently matches how many wchar_t it takes up.
- We have little to no proper support for a codepoint requiring two UTF-16 code units that is only seen as one column wide (Narrow emoji >=U+10000 make WriteCharsLegacy sad ("wrong character insertion when scrolling to bash history") #6555 (master issue), Extra space cell inserted after unicode character #6162 Links/URLs get offset when certain unicode characters are printed #8709)
- Provide a platform on which to build full ZWJ support (Feature Request: Finish full unicode support (M:N cell rendering, ZWJ?) #1472)
- Kill
CharRow,CharRowCell,CharRowCellReference - Reduce the static storage required to store a row (eventually) by not storing space characters
- This should make MeasureRight faster, and therefore help fix Console doesn't handle colored regions when reflowed #32.
Notes
Surrogate Pairs
Work will be required to teach WriteCharsLegacy to measure UTF-16 codepoints in aggregate, rather than individual code units.
I have done a small amount of work in WriteCharsLegacy. It is slow going.
Motivation
#8689 (IRM) requires us to be able to shift buffer contents rightward. I implemented it in a hacky way, but then realized that UnicodeStorage would need to be rekeyed.
Implementation
The buffer is currently stored as a vector (small_vector) of CharRowCell, each of which contains a DbcsAttribute and a wchar_t. Each cell takes 3 bytes (plus padding, if required.)
In the common case (all narrow text), this is terribly wasteful.
To better support codepoints requiring one or more code units representing a character, we are going to move to a single wchar string combined with a column count table. The column count table will be stored compressed by way of til::rle (#8741).
Simple case - all glyphs narrow
CHAR A B C D
UNITS 0041 0042 0043 0044
COLS 1 1 1 1
Simple case - all glyphs wide
CHAR カ タ カ ナ
UNITS 30ab 30bf 30ab 30ca
COLS 2 2 2 2
Surrogate pair case - glyphs narrow
CHAR 🕴 🕴 🕴
UNITS d83d dd74 d83d dd74 d83d dd74
COLS 1 0 1 0 1 0
Surrogate pair case - glyphs wide
CHAR 🥶 🥶 🥶
UNITS d83e dd76 d83e dd76 d83e dd76
COLS 2 0 2 0 2 0
Representative complicated case
CHAR 🥶 A B 🕴
UNITS d83e dd76 0041 0042 d83d dd74
COLS 2 0 1 1 1 0
Representative complicated case (huge character)
[FUTURE WORK]
CHAR ﷽
UNITS fdfd
COLS 12
Representative complicated case (Emoji with skin tone variation)
[FUTURE WORK]
CHAR 👍🏼
UNITS d83d dc31 200d d83d dc64
COLS 2 0 0 0 0
A column count of zero indicates a code unit that is a continuation of an existing glyph.
Since there is one column width for each code unit, it is trivial to match column offsets with character string indices by summation.
Work Log
- Add tests for reflow so that we can rewrite it (Add some tests for TextBuffer::Reflow #8715)
- Hide more of CharRow/AttrRow's implementation details inside Row (ROW: clean up in preparation to hide CharRow & AttrRow #8446)
- (from Michael)
til::rle<T, S>- a run length encoded storage template, which we will use to store column counts
Other issues that might just be fixed by this
- strange behavior with tab[char] +line wrap + backspace in conhost/conpty #8839
- Some Unicode characters cause a mismatch between the cursor position and the displayed width #11756
- Console doesn't handle colored regions when reflowed #32
- Rendering errors in tmux split panes #6987
- Bash on Ubuntu on Windows drawing issues while resizing window #30
-
TextBuffer::Reflowperforms poorly for large buffers #4968