Fuzz fix, quote invalid UTF-8 values #4

judwhite · 2016-11-14T08:24:34Z

Summary:

Fuzz function looked like it was testing the same input twice, I changed that and a reserialization difference was found
The test failed for the value "\xbd\x00" with "reserialized data does not match"; commit message has more details
Fixed by quoting any invalid UTF-8 string and any string containing the literal utf8.RuneError
Added decode checks on the key making the parser more strict on key input. I'd like some help here, the decode benchmark difference below doesn't seem to justify the edge case it covers
Probably not ready to merge just yet

New tests results with existing code:

--- FAIL: TestDecoder_errors (0.00s)
	decode_test.go:145: got: <nil>, want: logfmt syntax error at pos 5 on line 1: invalid key
	decode_test.go:145: got: <nil>, want: logfmt syntax error at pos 2 on line 1: invalid key
--- FAIL: TestEncodeKeyval (0.00s)
	encode_test.go:71: "k", "\xbd": got 'k=�', want 'k="\ufffd"'
	encode_test.go:71: "k", "�\x00": got 'k="�\u0000"', want 'k="\ufffd\u0000"'
	encode_test.go:71: "k", "�": got 'k=�', want 'k="\ufffd"'
	encode_test.go:71: "k", []byte{0xef, 0xbf, 0xbd, 0x0}: got 'k="�\u0000"', want 'k="\ufffd\u0000"'
	encode_test.go:71: "k", []byte{0xef, 0xbf, 0xbd}: got 'k=�', want 'k="\ufffd"'
--- FAIL: TestMarshalKeyvals (0.00s)
	encode_test.go:135: []interface {}{"�"}: got error: <nil>, want error: invalid key
	encode_test.go:138: []interface {}{"�"}: got '�=null', want ''
	encode_test.go:135: []interface {}{"\xbd"}: got error: <nil>, want error: invalid key
	encode_test.go:138: []interface {}{"\xbd"}: got '�=null', want ''

Old benchmarks:

BenchmarkDecodeKeyval-8              500           2454245 ns/op         203.73 MB/s
BenchmarkKRDecode-8                  500           2332233 ns/op         214.39 MB/s
BenchmarkEncodeKeyval-8          2000000               704 ns/op              64 B/op          4 allocs/op

New benchmarks:

BenchmarkDecodeKeyval-8              500           3694369 ns/op         135.34 MB/s
BenchmarkKRDecode-8                 1000           2369236 ns/op         211.04 MB/s
BenchmarkEncodeKeyval-8          2000000               716 ns/op              64 B/op          4 allocs/op

- for parse/write/parse check, change second parse to parse the write of the parsed value instead of the original input encode: - invalidKeyRune: treat utf8.RuneError as invalid rune - add invalidKey, invalidKeyString funcs - checks len(key) == 0, invalidKeyRune, and utf8.Valid - needsQuotedValueRune: if value contains utf8.RuneError quote the value. a literal k=\ufffd encodes to k=\"\\ufffd\" - writeStringValue, writeBytesValue: quote any invalid utf8 string - the above two changes fix the "reserialized data does not match" error found during fuzz testing decode: - reject invalidKey as parse error jsonstring: - remove "&& size == 1" when checking for rune decode error fuzz testing output: - .quoted: "0=\"\xbd\x00\"" - .output: panic: reserialized data does not match: "0=\"\\ufffd\\u0000\"\n" "0=\"�\\u0000\"\n"

coveralls · 2016-11-14T08:51:19Z

Coverage increased (+1.9%) to 83.447% when pulling 52e29b1 on judwhite:feature/fuzz into d432719 on go-logfmt:master.

ChrisHines · 2016-11-15T02:25:20Z

Fantastic catch!

ChrisHines · 2016-11-15T02:30:50Z

encode_test.go

+		{key: "k", value: "\ufffd\x00", want: "k=\"\\ufffd\\u0000\""},
+		{key: "k", value: "\ufffd", want: "k=\"\\ufffd\""},
+		{key: "k", value: []byte("\ufffd\x00"), want: "k=\"\\ufffd\\u0000\""},
+		{key: "k", value: []byte("\ufffd"), want: "k=\"\\ufffd\""},


Improve readability by using back-tick quotes for all of the want values containing quotes and escape sequences.

ChrisHines · 2016-11-15T03:41:04Z

encode.go

+}
+
+func invalidKeyString(key string) bool {
+	return len(key) == 0 || strings.IndexFunc(key, invalidKeyRune) != -1 || !utf8.ValidString(key)


I think calling utf8.ValidString is overkill here. Note that strings.IndexFunc already calls utf8.DecodeRuneInString internally and passes each rune to invalidKeyRune. It would seem that any invalid UTF-8 is already caught by simply adding the r == utf8.RuneError check to invalidKeyRune. This change improves the performance some, but I still measure it as a 25% drop in throughput from before. The rest of the performance drop is simply due to the new calls to invalidKey in the decoder that weren't there before. But we can avoid most of that penalty for most logs if we only check for invalid UTF-8 in keys that contain bytes >= 0x80.

ChrisHines · 2016-11-15T03:42:49Z

encode.go

 	if ok && value == "null" {
 		_, err = io.WriteString(w, `"null"`)
-	} else if strings.IndexFunc(value, needsQuotedValueRune) != -1 {
+	} else if strings.IndexFunc(value, needsQuotedValueRune) != -1 || !utf8.ValidString(value) {


Call to utf8.ValidString not needed.

ChrisHines · 2016-11-15T03:42:54Z

encode.go

 func writeBytesValue(w io.Writer, value []byte) error {
 	var err error
-	if bytes.IndexFunc(value, needsQuotedValueRune) >= 0 {
+	if bytes.IndexFunc(value, needsQuotedValueRune) >= 0 || !utf8.Valid(value) {


Call to utf8.Valid not needed.

ChrisHines · 2016-11-15T05:05:11Z

encode.go

+}
+
+func invalidKey(key []byte) bool {
+	return len(key) == 0 || bytes.IndexFunc(key, invalidKeyRune) != -1 || !utf8.Valid(key)


Call to utf8.Valid not needed.

ChrisHines · 2016-11-15T05:14:11Z

decode.go

+				if invalidKey(dec.key) {
+					dec.syntaxError(invalidKeyError)
+					return false
+				}


We can avoid doing extra work for ASCII only keys.

Initialize multibyte := false outside the loop.

Add case c >= utf8.RuneSelf: multibyte = true at the bottom of the switch.

Change each if invalidKey(dec.key) to if multibyte && bytes.IndexRune(dec.key, utf8.RuneError) != -1

ChrisHines · 2016-11-15T05:49:13Z

@judwhite Can you re-target this PR to the develop branch? This project is using Git flow now, so we shouldn't update master until we make a new release.

coveralls · 2016-11-15T10:38:51Z

Coverage increased (+2.5%) to 84.014% when pulling a9a71ce on judwhite:feature/fuzz into d432719 on go-logfmt:develop.

judwhite · 2016-11-15T10:46:30Z

These benchmarks were taken from a different computer.

First PR iteration benchmarks (outdated; for comparison):

BenchmarkDecodeKeyval-8              500           3828872 ns/op         130.59 MB/s
BenchmarkKRDecode-8                  500           2522546 ns/op         198.21 MB/s
BenchmarkEncodeKeyval-8          2000000               745 ns/op              64 B/op          4 allocs/op

Current develop benchmarks:

BenchmarkDecodeKeyval-8              500           2598951 ns/op         192.39 MB/s
BenchmarkKRDecode-8                  500           2491137 ns/op         200.71 MB/s
BenchmarkEncodeKeyval-8          2000000               735 ns/op              64 B/op          4 allocs/op

Feedback applied benchmarks:

BenchmarkDecodeKeyval-8              500           2802288 ns/op         178.43 MB/s
BenchmarkKRDecode-8                  500           2506353 ns/op         199.49 MB/s
BenchmarkEncodeKeyval-8          2000000               734 ns/op              64 B/op          4 allocs/op

judwhite added 3 commits November 14, 2016 02:05

golint cleanup

9b7959f

add failing tests

282c134

ChrisHines requested changes Nov 15, 2016

View reviewed changes

judwhite changed the base branch from master to develop November 15, 2016 05:57

apply review feedback

a9a71ce

ChrisHines approved these changes Nov 15, 2016

View reviewed changes

ChrisHines merged commit 390ab79 into go-logfmt:develop Nov 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fuzz fix, quote invalid UTF-8 values #4

Fuzz fix, quote invalid UTF-8 values #4

Uh oh!

judwhite commented Nov 14, 2016

Uh oh!

coveralls commented Nov 14, 2016 •

edited

Loading

Uh oh!

ChrisHines commented Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016 •

edited

Loading

Uh oh!

ChrisHines Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016

Uh oh!

ChrisHines commented Nov 15, 2016

Uh oh!

coveralls commented Nov 15, 2016 •

edited

Loading

Uh oh!

judwhite commented Nov 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fuzz fix, quote invalid UTF-8 values #4

Fuzz fix, quote invalid UTF-8 values #4

Uh oh!

Conversation

judwhite commented Nov 14, 2016

Uh oh!

coveralls commented Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChrisHines commented Nov 15, 2016

Uh oh!

ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

Uh oh!

ChrisHines Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

Uh oh!

ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

Uh oh!

ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

Uh oh!

ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

Uh oh!

ChrisHines commented Nov 15, 2016

Uh oh!

coveralls commented Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

judwhite commented Nov 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Nov 14, 2016 •

edited

Loading

ChrisHines Nov 15, 2016 •

edited

Loading

coveralls commented Nov 15, 2016 •

edited

Loading