Skip to content

Limit VNG files to a single segment#4862

Merged
nwt merged 1 commit intomainfrom
single-segment
Nov 6, 2023
Merged

Limit VNG files to a single segment#4862
nwt merged 1 commit intomainfrom
single-segment

Conversation

@nwt
Copy link
Member

@nwt nwt commented Nov 6, 2023

The VNG dictionary representation does not work for multiple segments if the index of a dictionary entry changes after a segment containing that entry is written. Work around this by limiting VNG files to a single segment.

(We'll fix this permanently either by removing segments entirely or by adding per-segment dictionaries.)

Closes #4839

The VNG dictionary representation does not work for multiple segments if
the index of a dictionary entry changes after a segment containing that
entry is written.  Work around this by limiting VNG files to a single
segment.

We'll fix this permanently either by removing segments entirely or by
adding per-segment dictionaries.
@nwt nwt requested a review from a team November 6, 2023 22:35
@jamii jamii mentioned this pull request Nov 6, 2023
@jamii
Copy link
Contributor

jamii commented Nov 6, 2023

This doesn't seem to prevent dict bugs. I'm still seeing multiple calls to makeDictVector:

> go test -run=FuzzVngRoundtripGen/17d3114a3be63cb1
goroutine 7 [running]:
runtime/debug.Stack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:16 +0x19
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDict(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:153 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDictVector(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:87 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).Flush(0xc000043000, 0x8?)
	/home/jamie/zed/vng/vector/primitive.go:76 +0x29
github.com/brimdata/zed/vng/vector.(*NullsWriter).Flush(0xc00010d438?, 0xf5?)
	/home/jamie/zed/vng/vector/nulls.go:66 +0x98
github.com/brimdata/zed/vng.(*Writer).flush(0xc000042f80, 0x71?)
	/home/jamie/zed/vng/writer.go:105 +0x6e
github.com/brimdata/zed/vng.(*Writer).Write(0xc000042f80, 0xc000015580)
	/home/jamie/zed/vng/writer.go:98 +0x2b5
github.com/brimdata/zed/zio.CopyWithContext({0x8d3988, 0xc000018118}, {0x8d0ec0, 0xc000042f80}, {0x8d10e0, 0xc000010420})
	/home/jamie/zed/zio/zio.go:158 +0x7f
github.com/brimdata/zed/zio.Copy(...)
	/home/jamie/zed/zio/zio.go:146
github.com/brimdata/zed/vng_test.roundtrip(0xc0001e3860, {0xc000015580, 0x2, 0x2}, {0xc0000a46d8?, 0xc0000a4720?})
	/home/jamie/zed/vng/vng_test.go:89 +0x165
github.com/brimdata/zed/vng_test.FuzzVngRoundtripGen.func1(0x0?, {0xc000019700?, 0x0?, 0x468859?})
	/home/jamie/zed/vng/vng_test.go:57 +0x2a6
reflect.Value.call({0x755400?, 0x7f6010?, 0x13?}, {0x7bd05c, 0x4}, {0xc000290240, 0x2, 0x2?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:586 +0xb0b
reflect.Value.Call({0x755400?, 0x7f6010?, 0xa28e10?}, {0xc000290240?, 0x7bc300?, 0xc000019728?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:370 +0xbc
testing.(*F).Fuzz.func1.1(0x0?)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:335 +0x3f3
testing.tRunner(0xc0001e3860, 0xc000250630)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/testing.go:1576 +0x10b
created by testing.(*F).Fuzz.func1
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:322 +0x5b9
goroutine 7 [running]:
runtime/debug.Stack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:16 +0x19
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDict(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:153 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDictVector(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:87 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).Flush(0xc000043000, 0x8?)
	/home/jamie/zed/vng/vector/primitive.go:76 +0x29
github.com/brimdata/zed/vng/vector.(*NullsWriter).Flush(0xc00010d438?, 0xf5?)
	/home/jamie/zed/vng/vector/nulls.go:66 +0x98
github.com/brimdata/zed/vng.(*Writer).flush(0xc000042f80, 0x81?)
	/home/jamie/zed/vng/writer.go:105 +0x6e
github.com/brimdata/zed/vng.(*Writer).Write(0xc000042f80, 0xc0000155a0)
	/home/jamie/zed/vng/writer.go:98 +0x2b5
github.com/brimdata/zed/zio.CopyWithContext({0x8d3988, 0xc000018118}, {0x8d0ec0, 0xc000042f80}, {0x8d10e0, 0xc000010420})
	/home/jamie/zed/zio/zio.go:158 +0x7f
github.com/brimdata/zed/zio.Copy(...)
	/home/jamie/zed/zio/zio.go:146
github.com/brimdata/zed/vng_test.roundtrip(0xc0001e3860, {0xc000015580, 0x2, 0x2}, {0xc0000a46d8?, 0xc0000a4720?})
	/home/jamie/zed/vng/vng_test.go:89 +0x165
github.com/brimdata/zed/vng_test.FuzzVngRoundtripGen.func1(0x0?, {0xc000019700?, 0x0?, 0x468859?})
	/home/jamie/zed/vng/vng_test.go:57 +0x2a6
reflect.Value.call({0x755400?, 0x7f6010?, 0x13?}, {0x7bd05c, 0x4}, {0xc000290240, 0x2, 0x2?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:586 +0xb0b
reflect.Value.Call({0x755400?, 0x7f6010?, 0xa28e10?}, {0xc000290240?, 0x7bc300?, 0xc000019728?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:370 +0xbc
testing.(*F).Fuzz.func1.1(0x0?)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:335 +0x3f3
testing.tRunner(0xc0001e3860, 0xc000250630)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/testing.go:1576 +0x10b
created by testing.(*F).Fuzz.func1
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:322 +0x5b9
goroutine 7 [running]:
runtime/debug.Stack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:16 +0x19
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDict(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:153 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDictVector(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:87 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).Flush(0xc000043000, 0xb0?)
	/home/jamie/zed/vng/vector/primitive.go:76 +0x29
github.com/brimdata/zed/vng/vector.(*NullsWriter).Flush(0xc0000197a0?, 0x0?)
	/home/jamie/zed/vng/vector/nulls.go:66 +0x98
github.com/brimdata/zed/vng.(*Writer).flush(0xc000042f80, 0x0?)
	/home/jamie/zed/vng/writer.go:105 +0x6e
github.com/brimdata/zed/vng.(*Writer).finalize(0xc000042f80)
	/home/jamie/zed/vng/writer.go:113 +0x3b
github.com/brimdata/zed/vng.(*Writer).Close(0xc000042f80)
	/home/jamie/zed/vng/writer.go:58 +0x1e
github.com/brimdata/zed/vng_test.roundtrip(0xc0001e3860, {0xc000015580, 0x2, 0x2}, {0xc0000a46d8?, 0xc0000a4720?})
	/home/jamie/zed/vng/vng_test.go:90 +0x194
github.com/brimdata/zed/vng_test.FuzzVngRoundtripGen.func1(0x0?, {0xc000019700?, 0x0?, 0x468859?})
	/home/jamie/zed/vng/vng_test.go:57 +0x2a6
reflect.Value.call({0x755400?, 0x7f6010?, 0x13?}, {0x7bd05c, 0x4}, {0xc000290240, 0x2, 0x2?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:586 +0xb0b
reflect.Value.Call({0x755400?, 0x7f6010?, 0xa28e10?}, {0xc000290240?, 0x7bc300?, 0xc000019728?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:370 +0xbc
testing.(*F).Fuzz.func1.1(0x0?)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:335 +0x3f3
testing.tRunner(0xc0001e3860, 0xc000250630)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/testing.go:1576 +0x10b
created by testing.(*F).Fuzz.func1
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:322 +0x5b9
goroutine 7 [running]:
runtime/debug.Stack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/runtime/debug/stack.go:16 +0x19
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).makeDict(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:153 +0x36
github.com/brimdata/zed/vng/vector.(*PrimitiveWriter).Metadata(0xc000043000)
	/home/jamie/zed/vng/vector/primitive.go:138 +0xa5
github.com/brimdata/zed/vng/vector.(*NullsWriter).Metadata(0xc000288d20)
	/home/jamie/zed/vng/vector/nulls.go:70 +0x2d
github.com/brimdata/zed/vng.(*Writer).finalize(0xc000042f80)
	/home/jamie/zed/vng/writer.go:134 +0x362
github.com/brimdata/zed/vng.(*Writer).Close(0xc000042f80)
	/home/jamie/zed/vng/writer.go:58 +0x1e
github.com/brimdata/zed/vng_test.roundtrip(0xc0001e3860, {0xc000015580, 0x2, 0x2}, {0xc0000a46d8?, 0xc0000a4720?})
	/home/jamie/zed/vng/vng_test.go:90 +0x194
github.com/brimdata/zed/vng_test.FuzzVngRoundtripGen.func1(0x0?, {0xc000019700?, 0x0?, 0x468859?})
	/home/jamie/zed/vng/vng_test.go:57 +0x2a6
reflect.Value.call({0x755400?, 0x7f6010?, 0x13?}, {0x7bd05c, 0x4}, {0xc000290240, 0x2, 0x2?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:586 +0xb0b
reflect.Value.Call({0x755400?, 0x7f6010?, 0xa28e10?}, {0xc000290240?, 0x7bc300?, 0xc000019728?})
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/reflect/value.go:370 +0xbc
testing.(*F).Fuzz.func1.1(0x0?)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:335 +0x3f3
testing.tRunner(0xc0001e3860, 0xc000250630)
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/testing.go:1576 +0x10b
created by testing.(*F).Fuzz.func1
	/nix/store/m3mi5km1zdmaqdb33byirlixigzh3f4l-go-1.20.7/share/go/src/testing/fuzz.go:322 +0x5b9
--- FAIL: FuzzVngRoundtripGen (0.00s)
    --- FAIL: FuzzVngRoundtripGen/17d3114a3be63cb1 (0.00s)
        vng_test.go:100: comparing: len(in)=2 vs len(out)=2
        vng_test.go:109: comparing: in[0]=<float16> vs out[0]=<uint8>
        vng_test.go:114: values have different zng bytes: [14] vs [0]
        vng_test.go:109: comparing: in[1]=<uint8> vs out[1]=<uint8>
FAIL
exit status 1
FAIL	github.com/brimdata/zed/vng	0.007s

But I don't see any errors if I use the default skew/column thresholds, which suggests that they're still having some effect on the output?

@jamii
Copy link
Contributor

jamii commented Nov 6, 2023

Oh I see, using NewWriterWithOpts will potentially still trigger

If I comment out https://github.com/brimdata/zed/blob/105b3c408e101a86ab148e103dc4ac700fe6793e//vng/writer.go#L97-L100 I don't get the fuzz failures, but I don't see how it's reachable...

@jamii
Copy link
Contributor

jamii commented Nov 6, 2023

Sorry, ignore all that - I merged it badly with the fuzzer branch.

Copy link
Contributor

@jamii jamii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit :)

@nwt nwt merged commit fd4834d into main Nov 6, 2023
@nwt nwt deleted the single-segment branch November 6, 2023 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VNG dict vectors don't always roundtrip succesfully

2 participants