@@ -15,9 +15,18 @@ search for multi-byte encoded strings in binary data.
1515 :Date: 2017-01-03
1616 :Version: 1.2.0
1717
18+ :Date: 2017-01-04
19+ :Version: 1.2.1
20+
21+ :Date: 2017-01-05
22+ :Version: 1.2.2
23+
24+ :Date: 2017-01-07
25+ :Version: 1.3.0
26+
1827:Author: Jens Getreu
19- :Date: 2017-01-04
20- :Version: 1.2 .1
28+ :Date: 2017-01-08
29+ :Version: 1.3 .1
2130:Copyright: Apache License, Version 2.0 (for details see COPYING section)
2231:Manual section: 1
2332:Manual group: Forensic Tools
@@ -104,7 +113,7 @@ OPTIONS
104113**-e ** *ENC *, **--encoding **\ =\ *ENC *
105114 Set (multiple) input search encodings.
106115
107- *ENC *\ ==\ *ENCNAME *\ [,\ *MIN *\ [,\ *UNICODEBLOCK *\ ]]
116+ *ENC *\ ==\ *ENCNAME *\ [,\ *MIN *\ [,\ *UNICODEBLOCK *\ [, \ * UNICODEBLOCK * \ ] ]]
108117
109118 *ENCNAME *
110119 Search for strings in encoded in ENCNAME. Encoding names
@@ -119,7 +128,7 @@ OPTIONS
119128 *UNICODEBLOCK *
120129 Restrict the search to characters within *UNICODEBLOCK *. This
121130 can be used to search for a certain script or to reduce false
122- positives when searching for UTF-16 encoded strings. See
131+ positives, especially when searching for UTF-16 encoded strings. See
123132 ``https://en.wikipedia.org/wiki/Unicode_block `` for a list of
124133 scripts and their corresponding Unicode-block-ranges.
125134 *UNICODEBLOCK * has the following syntax:
@@ -136,10 +145,8 @@ OPTIONS
136145 this case a warning specifying the enlarged *UNICODEBLOCK * is
137146 emitted.
138147
139- The following characters do not observe *UNICODEBLOCK *
140- restrictions and are always printed even if they are out of range:
141- ``\t !"#$%&'()*+,-./0123456789:;<=>? ``
142- (U+0009, U+0020..U+003F).
148+ When a second optional *UNICODEBLOCK * is given, the total
149+ Unicode-point search range is the union of the first and the second.
143150
144151 See the output of **--help ** for the default value of *ENC *.
145152
@@ -218,17 +225,19 @@ When used with pipes ``-c r`` is required:
218225 stringsext -e iso-8859-7 -c r -t x someimage.raw | grep "Ιστορία"
219226
220227Reduce the number of false positives, when scanning an image file for
221- UTF-16:
228+ UTF-16. In the following example we search for Cyrillic, Arabic and Siriac
229+ strings, which may contain these additional these symbols:
230+ ``\t !"#$%&'()*+,-./0123456789:;<=>? ``
222231
223232::
224233
225- stringsext -e UTF-16le,20 ,U+0 ..U+3FF -e UTF-16le,20, U+400..U+7FF someimage.raw
234+ stringsext -e UTF-16le,30 ,U+20 ..U+3f, U+400..U+07ff someimage.raw
226235
227236The same but shorter:
228237
229238::
230239
231- stringsext -e UTF-16le,20,0..3FF -e UTF-16le,20, 400..7FF someimage.raw
240+ stringsext -e UTF-16le,30,20..3f, 400..07ff someimage.raw
232241
233242Combine Little-Endian and Big-Endian scanning:
234243
@@ -246,6 +255,13 @@ The following settings are designed to produce bit-identical output with
246255 stringsext -e ascii -c i -t x # equals `strings -t x`
247256 stringsext -e ascii -c i -t o # equals `strings -t o`
248257
258+ The following examples perform the same search, but the output format is
259+ slightly different:
260+
261+ ::
262+
263+ stringsext -e UTF-16LE,10,0..7f # equals `strings -n 10 -e l`
264+ stringsext -e UTF-16BE,10,0..7f # equals `strings -n 10 -e b`
249265
250266
251267LIMITATIONS
@@ -289,7 +305,10 @@ will most likely never exceed the WIN\_LEN buffer and therefor will never be
289305split. In such a scenario it is a good practise to run Unicode and ASCII
290306scanners in parallel.
291307
292-
308+ When a graphic string has to be cut at the WIN_LEN buffer boundary, *stringsext *
309+ can not in all cases determine the length of the first piece. In these rare
310+ cases *stringsext * always prints the second piece, even when it is shorter than
311+ **--bytes ** would require.
293312
294313
295314
0 commit comments