Skip to content

Conversation

@judwhite
Copy link
Contributor

Summary:

  • Fuzz function looked like it was testing the same input twice, I changed that and a reserialization difference was found
  • The test failed for the value "\xbd\x00" with "reserialized data does not match"; commit message has more details
  • Fixed by quoting any invalid UTF-8 string and any string containing the literal utf8.RuneError
  • Added decode checks on the key making the parser more strict on key input. I'd like some help here, the decode benchmark difference below doesn't seem to justify the edge case it covers
  • Probably not ready to merge just yet

New tests results with existing code:

--- FAIL: TestDecoder_errors (0.00s)
	decode_test.go:145: got: <nil>, want: logfmt syntax error at pos 5 on line 1: invalid key
	decode_test.go:145: got: <nil>, want: logfmt syntax error at pos 2 on line 1: invalid key
--- FAIL: TestEncodeKeyval (0.00s)
	encode_test.go:71: "k", "\xbd": got 'k=�', want 'k="\ufffd"'
	encode_test.go:71: "k", "�\x00": got 'k="�\u0000"', want 'k="\ufffd\u0000"'
	encode_test.go:71: "k", "�": got 'k=�', want 'k="\ufffd"'
	encode_test.go:71: "k", []byte{0xef, 0xbf, 0xbd, 0x0}: got 'k="�\u0000"', want 'k="\ufffd\u0000"'
	encode_test.go:71: "k", []byte{0xef, 0xbf, 0xbd}: got 'k=�', want 'k="\ufffd"'
--- FAIL: TestMarshalKeyvals (0.00s)
	encode_test.go:135: []interface {}{"�"}: got error: <nil>, want error: invalid key
	encode_test.go:138: []interface {}{"�"}: got '�=null', want ''
	encode_test.go:135: []interface {}{"\xbd"}: got error: <nil>, want error: invalid key
	encode_test.go:138: []interface {}{"\xbd"}: got '�=null', want ''

Old benchmarks:

BenchmarkDecodeKeyval-8              500           2454245 ns/op         203.73 MB/s
BenchmarkKRDecode-8                  500           2332233 ns/op         214.39 MB/s
BenchmarkEncodeKeyval-8          2000000               704 ns/op              64 B/op          4 allocs/op

New benchmarks:

BenchmarkDecodeKeyval-8              500           3694369 ns/op         135.34 MB/s
BenchmarkKRDecode-8                 1000           2369236 ns/op         211.04 MB/s
BenchmarkEncodeKeyval-8          2000000               716 ns/op              64 B/op          4 allocs/op

- for parse/write/parse check, change second parse to parse the write
  of the parsed value instead of the original input

encode:
- invalidKeyRune: treat utf8.RuneError as invalid rune
- add invalidKey, invalidKeyString funcs
  - checks len(key) == 0, invalidKeyRune, and utf8.Valid
- needsQuotedValueRune: if value contains utf8.RuneError quote the
  value. a literal k=\ufffd encodes to k=\"\\ufffd\"
- writeStringValue, writeBytesValue: quote any invalid utf8 string
- the above two changes fix the "reserialized data does not match"
  error found during fuzz testing

decode:
- reject invalidKey as parse error

jsonstring:
- remove "&& size == 1" when checking for rune decode error

fuzz testing output:
- .quoted:
  "0=\"\xbd\x00\""
- .output:
  panic: reserialized data does not match:
  "0=\"\\ufffd\\u0000\"\n"
  "0=\"�\\u0000\"\n"
@coveralls
Copy link

coveralls commented Nov 14, 2016

Coverage Status

Coverage increased (+1.9%) to 83.447% when pulling 52e29b1 on judwhite:feature/fuzz into d432719 on go-logfmt:master.

@ChrisHines
Copy link
Member

Fantastic catch!

encode_test.go Outdated
{key: "k", value: "\ufffd\x00", want: "k=\"\\ufffd\\u0000\""},
{key: "k", value: "\ufffd", want: "k=\"\\ufffd\""},
{key: "k", value: []byte("\ufffd\x00"), want: "k=\"\\ufffd\\u0000\""},
{key: "k", value: []byte("\ufffd"), want: "k=\"\\ufffd\""},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve readability by using back-tick quotes for all of the want values containing quotes and escape sequences.

encode.go Outdated
}

func invalidKeyString(key string) bool {
return len(key) == 0 || strings.IndexFunc(key, invalidKeyRune) != -1 || !utf8.ValidString(key)
Copy link
Member

@ChrisHines ChrisHines Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling utf8.ValidString is overkill here. Note that strings.IndexFunc already calls utf8.DecodeRuneInString internally and passes each rune to invalidKeyRune. It would seem that any invalid UTF-8 is already caught by simply adding the r == utf8.RuneError check to invalidKeyRune. This change improves the performance some, but I still measure it as a 25% drop in throughput from before. The rest of the performance drop is simply due to the new calls to invalidKey in the decoder that weren't there before. But we can avoid most of that penalty for most logs if we only check for invalid UTF-8 in keys that contain bytes >= 0x80.

encode.go Outdated
if ok && value == "null" {
_, err = io.WriteString(w, `"null"`)
} else if strings.IndexFunc(value, needsQuotedValueRune) != -1 {
} else if strings.IndexFunc(value, needsQuotedValueRune) != -1 || !utf8.ValidString(value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to utf8.ValidString not needed.

encode.go Outdated
func writeBytesValue(w io.Writer, value []byte) error {
var err error
if bytes.IndexFunc(value, needsQuotedValueRune) >= 0 {
if bytes.IndexFunc(value, needsQuotedValueRune) >= 0 || !utf8.Valid(value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to utf8.Valid not needed.

encode.go Outdated
}

func invalidKey(key []byte) bool {
return len(key) == 0 || bytes.IndexFunc(key, invalidKeyRune) != -1 || !utf8.Valid(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to utf8.Valid not needed.

if invalidKey(dec.key) {
dec.syntaxError(invalidKeyError)
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid doing extra work for ASCII only keys.

  1. Initialize multibyte := false outside the loop.
  2. Add case c >= utf8.RuneSelf: multibyte = true at the bottom of the switch.
  3. Change each if invalidKey(dec.key) to if multibyte && bytes.IndexRune(dec.key, utf8.RuneError) != -1

@ChrisHines
Copy link
Member

@judwhite Can you re-target this PR to the develop branch? This project is using Git flow now, so we shouldn't update master until we make a new release.

@judwhite judwhite changed the base branch from master to develop November 15, 2016 05:57
@coveralls
Copy link

coveralls commented Nov 15, 2016

Coverage Status

Coverage increased (+2.5%) to 84.014% when pulling a9a71ce on judwhite:feature/fuzz into d432719 on go-logfmt:develop.

@judwhite
Copy link
Contributor Author

These benchmarks were taken from a different computer.

First PR iteration benchmarks (outdated; for comparison):

BenchmarkDecodeKeyval-8              500           3828872 ns/op         130.59 MB/s
BenchmarkKRDecode-8                  500           2522546 ns/op         198.21 MB/s
BenchmarkEncodeKeyval-8          2000000               745 ns/op              64 B/op          4 allocs/op

Current develop benchmarks:

BenchmarkDecodeKeyval-8              500           2598951 ns/op         192.39 MB/s
BenchmarkKRDecode-8                  500           2491137 ns/op         200.71 MB/s
BenchmarkEncodeKeyval-8          2000000               735 ns/op              64 B/op          4 allocs/op

Feedback applied benchmarks:

BenchmarkDecodeKeyval-8              500           2802288 ns/op         178.43 MB/s
BenchmarkKRDecode-8                  500           2506353 ns/op         199.49 MB/s
BenchmarkEncodeKeyval-8          2000000               734 ns/op              64 B/op          4 allocs/op

@ChrisHines ChrisHines merged commit 390ab79 into go-logfmt:develop Nov 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants