Skip to content

Commit 08a67af

Browse files
committed
document new command-line-option
1 parent 97b2b6a commit 08a67af

File tree

2 files changed

+18
-3
lines changed

2 files changed

+18
-3
lines changed

doc/source/stringsext--man.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
% STRINGSEXT(1) Version 2.1.1 | Stringsext Documentation
1+
% STRINGSEXT(1) Version 2.2.0 | Stringsext Documentation
22

33
<!--
44
previous versions
@@ -52,6 +52,9 @@ Version: 2.1.0
5252
5353
Date: 2020-02-01
5454
Version: 2.1.0
55+
56+
Date: 2020-03-17
57+
Version: 2.2.0
5558
-->
5659

5760
# NAME
@@ -77,7 +80,7 @@ binary data: It prints all graphic character sequences in *FILE* or
7780

7881
Unlike *GNU strings* **stringsext** can be configured to search for
7982
valid characters not only in ASCII but also in many other input
80-
encodings, e.g.: utf-8, utf-16be, utf-16le, big5, euc-jp, koi8-r
83+
encodings, e.g.: *utf-8, utf-16be, utf-16le, big5, euc-jp, koi8-r*
8184
and many others. **\--list-encodings** shows a list of valid encoding
8285
names based on the WHATWG Encoding Standard. When more than one encoding
8386
is specified, the scan is performed in different threads simultaneously.
@@ -199,6 +202,18 @@ as *GNU strings* replacement.
199202
next line. The downside with long output lines is, that the scanner loses
200203
precision in locating the findings.
201204

205+
**-r**, **\--same-unicode-block**
206+
207+
: Require all characters in a finding to originate from the same Unicode
208+
block. This option helps to reduce false positives, especially when
209+
scanning for UTF-16. When set, "`stringsext`" prints only Unicode block
210+
homogenous strings. For example: "`-u All -n 10 -r`" finds a sequence of at
211+
least 10 Cyrillic characters in a row or finds at least 10 Greek characters
212+
in a row, whereas it ignores strings with randomly Cyrillic-Greek mixed
213+
characters. Technically this option guarantees, that all multibyte
214+
characters of a finding - encoded as UTF-8 - start with the same leading
215+
byte.
216+
202217
**-s** *NUM*, **\--counter-offset**=*NUM*
203218

204219
: Start offset NUM for the input-stream-byte-counter given as decimal or

src/options.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ Options:
8484
chars_min_default!(),
8585
").
8686
-p FILE, --output=FILE Print not to stdout but in file.
87-
-q NUM, --output-line-len=NUM Output line length in UTF-8 characters (default: ",
87+
-q NUM, --output-line-len=NUM Output line length in Unicode-codepoints (default: ",
8888
output_line_char_nb_max_default!(),
8989
").
9090
-r, --same-unicode-block Require finding to be Unicode-block homogen.

0 commit comments

Comments
 (0)