1- % STRINGSEXT(1) Version 2.1.1 | Stringsext Documentation
1+ % STRINGSEXT(1) Version 2.2.0 | Stringsext Documentation
22
33<!--
44previous versions
@@ -52,6 +52,9 @@ Version: 2.1.0
5252
5353Date: 2020-02-01
5454Version: 2.1.0
55+
56+ Date: 2020-03-17
57+ Version: 2.2.0
5558-->
5659
5760# NAME
@@ -77,7 +80,7 @@ binary data: It prints all graphic character sequences in *FILE* or
7780
7881Unlike * GNU strings* ** stringsext** can be configured to search for
7982valid characters not only in ASCII but also in many other input
80- encodings, e.g.: utf-8, utf-16be, utf-16le, big5, euc-jp, koi8-r
83+ encodings, e.g.: * utf-8, utf-16be, utf-16le, big5, euc-jp, koi8-r*
8184and many others. ** \- -list-encodings** shows a list of valid encoding
8285names based on the WHATWG Encoding Standard. When more than one encoding
8386is specified, the scan is performed in different threads simultaneously.
@@ -199,6 +202,18 @@ as *GNU strings* replacement.
199202 next line. The downside with long output lines is, that the scanner loses
200203 precision in locating the findings.
201204
205+ ** -r** , ** \- -same-unicode-block**
206+
207+ : Require all characters in a finding to originate from the same Unicode
208+ block. This option helps to reduce false positives, especially when
209+ scanning for UTF-16. When set, "` stringsext ` " prints only Unicode block
210+ homogenous strings. For example: "` -u All -n 10 -r ` " finds a sequence of at
211+ least 10 Cyrillic characters in a row or finds at least 10 Greek characters
212+ in a row, whereas it ignores strings with randomly Cyrillic-Greek mixed
213+ characters. Technically this option guarantees, that all multibyte
214+ characters of a finding - encoded as UTF-8 - start with the same leading
215+ byte.
216+
202217** -s** * NUM* , ** \- -counter-offset** =* NUM*
203218
204219 : Start offset NUM for the input-stream-byte-counter given as decimal or
0 commit comments