Skip to content

Comments

use pubsub instead of filenotify to follow json logs#19498

Merged
LK4D4 merged 2 commits intomoby:masterfrom
mountkin:refactor-jsonfilelog-reader
Feb 16, 2016
Merged

use pubsub instead of filenotify to follow json logs#19498
LK4D4 merged 2 commits intomoby:masterfrom
mountkin:refactor-jsonfilelog-reader

Conversation

@mountkin
Copy link
Contributor

inotify event is trigged immediately there's data written to disk.
But at the time that the inotify event is received, the json line might
not fully saved to disk. If the json decoder tries to decode in such
case, an io.UnexpectedEOF will be trigged.
We used to retry for several times to mitigate the io.UnexpectedEOF error.
But there are still flaky tests caused by the partial log entries.

The daemon knows exactly when there are new log entries emitted. We can
use the pubsub package to notify all the log readers instead of inotify.

BTW: I can't run the test locally because the testing infrastructure seems to be broken on my laptop.

---> Making bundle: test-integration-cli (in bundles/1.10.0-dev/test-integration-cli)
---> Making bundle: .integration-daemon-start (in bundles/1.10.0-dev/test-integration-cli)
+++ /etc/init.d/apparmor start
 * Starting AppArmor profiles
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
   ...done.
INFO: Waiting for daemon to start...
+++ exec docker daemon --debug --host unix:///go/src/github.com/docker/docker/bundles/1.10.0-dev/test-integration-cli/docker.sock --storage-driver vfs --pidfile bundles/1.10.0-dev/test-integration-cli/docker.pid --userland-proxy=true
.
---> Making bundle: .integration-daemon-setup (in bundles/1.10.0-dev/test-integration-cli)
---> Making bundle: .detect-daemon-osarch (in bundles/1.10.0-dev/test-integration-cli)
---> Making bundle: .ensure-emptyfs (in bundles/1.10.0-dev/test-integration-cli)
++++ tar -cC bundles/1.10.0-dev/test-integration-cli/emptyfs .
++++ docker load
---> Making bundle: .ensure-frozen-images (in bundles/1.10.0-dev/test-integration-cli)
++++ tar -cC /docker-frozen-images .
++++ docker load
Error response from daemon: ApplyLayer exit status 1 stdout:  stderr: unexpected EOF
+++ bundle .integration-daemon-stop
+++ local bundle=.integration-daemon-stop
/go/src/github.com/docker/docker/hack/make.sh: line 257: local: can only be used in a function

Signed-off-by: Shijiang Wei [email protected]

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 20, 2016

@mountkin One logs test failed

@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from 16c3c51 to 2ac8db4 Compare January 21, 2016 17:02
@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jan 21, 2016
@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from 2ac8db4 to 37cb26a Compare January 21, 2016 17:03
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jan 21, 2016
@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch 2 times, most recently from d0b5a3f to 7835599 Compare January 22, 2016 15:06
@mountkin
Copy link
Contributor Author

@LK4D4 test passed (I rebased and force pushed so the jenkins jobs started again 🏊 Let's see the final result)

@cpuguy83
Copy link
Member

I like the general simplification, but the issue is now for every write we have to notify as well.
I think the flakey tests are generally from a race between stopping the log writer (on container stop) and getting all that out of the buffer in time.

@mountkin
Copy link
Contributor Author

@cpuguy83 I think the publish is much cheaper than inotify because if no one is running 'docker logs -f' the 'Publish' function does nothing more than locking and unlocking a mutex. And the 'Publish' is only called when a line is fully written to the disk (while inotify might be triggered many times when writing a line, see my debug log in another PR #17606).
If someone is running, the subscriber is also very lightweight and it never blocks (merely a atomic.AddUint64).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest avoiding timeout. Timers is very heavy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LK4D4 addressed. Thanks for pointing out this.

@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch 2 times, most recently from 6635882 to 3f1cb58 Compare January 28, 2016 15:04
@LK4D4
Copy link
Contributor

LK4D4 commented Jan 28, 2016

@cpuguy83 What's your thoughts?

@cpuguy83
Copy link
Member

The extra call for every write even when there may not be a reader is a little iffy.
The thing with inotify is, it's only consuming resources when there is a reader.

BenchmarkPubSub-8   10000000           175 ns/op         260 B/op          1 allocs/op
BenchmarkNoPubSub-8 30000000            82.1 ns/op       163 B/op          0 allocs/op

This is comparing just writing with the pubsub call and writing without.

@mountkin
Copy link
Contributor Author

@cpuguy83 the current version of 'Publish' function can be optimized a little. If no subscribers the waitgroup allocation can be omitted.

@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from 319b782 to 1106c13 Compare January 29, 2016 05:29
@mountkin
Copy link
Contributor Author

@cpuguy83 Here is the result of my benchmark:
go test -v -benchtime 10s -test.bench BenchmarkJSONFileLogger -test.benchmem -test.run=BenchmarkJSONFileLogger

filenotify:
BenchmarkJSONFileLogger-8 200000 60936 ns/op 59.57 MB/s 960 B/op 30 allocs/op

before mountkin@1106c13:
BenchmarkJSONFileLogger-8 200000 68881 ns/op 52.70 MB/s 1440 B/op 60 allocs/op

after mountkin@1106c13:
BenchmarkJSONFileLogger-8 200000 64656 ns/op 56.14 MB/s 960 B/op 30 allocs/op

Is the overhead after pubsub optimization acceptable?

@thaJeztah
Copy link
Member

ping @cpuguy83 @LK4D4 PTAL

@LK4D4
Copy link
Contributor

LK4D4 commented Feb 4, 2016

looks decent
@cpuguy83 WDYT?

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 6, 2016

Thanks @mountkin!

Yep, I'm good with those results, but I think we need some simplifications in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this name to writeNotify

@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from 1106c13 to 988de9c Compare February 6, 2016 16:42
@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from 988de9c to ffe1174 Compare February 6, 2016 16:44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should read to the end of the current file on rotate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuguy83 when this channel is readable the file has already been moved away. So reading the current file leads to a "bad file descriptor" error.
I've no idea how to handle this case correctly.
Moreover, it seems that following rotated log never worked correctly before.

@thaJeztah
Copy link
Member

ping @mountkin can you have a look at @cpuguy83's comment?

@mountkin
Copy link
Contributor Author

@thaJeztah sorry for the late response. I'm on holiday these days. I'll look into the failed tests tomorrow.

@thaJeztah
Copy link
Member

Oh no worries, enjoy your Holiday!

@mountkin mountkin force-pushed the refactor-jsonfilelog-reader branch from ffe1174 to 445d7e5 Compare February 15, 2016 09:23
inotify event is trigged immediately there's data written to disk.
But at the time that the inotify event is received, the json line might
not fully saved to disk. If the json decoder tries to decode in such
case, an io.UnexpectedEOF will be trigged.
We used to retry for several times to mitigate the io.UnexpectedEOF error.
But there are still flaky tests caused by the partial log entries.

The daemon knows exactly when there are new log entries emitted. We can
use the pubsub package to notify all the log readers instead of inotify.

Signed-off-by: Shijiang Wei <[email protected]>

try to fix broken test. will squash once tests pass

Signed-off-by: Shijiang Wei <[email protected]>
@mountkin
Copy link
Contributor Author

Updated.
@cpuguy83 PTAL this PR and #20327

@mountkin
Copy link
Contributor Author

BTW, it seems that the filenotify package is no longer needed when this PR is merged. Can I remove the package in another commit?

@cpuguy83
Copy link
Member

@mountkin Let's save it for another PR I think, b/c it's going to make this PR difficult to look at, even if in different commits.

writer: writer,
readers: make(map[*logger.LogWatcher]struct{}),
extra: extra,
writeNotifier: pubsub.NewPublisher(0, 10),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably only need 1 here instead of 10, but this should be fine. We'll just end up reading an EOF if there are extra notifications.

@cpuguy83
Copy link
Member

1 small nit, but I'm fine with it.

LGTM

@LK4D4
Copy link
Contributor

LK4D4 commented Feb 16, 2016

LGTM

LK4D4 added a commit that referenced this pull request Feb 16, 2016
use pubsub instead of filenotify to follow json logs
@LK4D4 LK4D4 merged commit 8545fdc into moby:master Feb 16, 2016
@thaJeztah
Copy link
Member

Based on #20327 (comment), this is a regression in 1.10.x, so I'm adding this to the 1.10.2 milestone

@thaJeztah thaJeztah added this to the 1.10.2 milestone Feb 17, 2016
@anusha-ragunathan
Copy link
Contributor

@tiborvass tiborvass removed this from the 1.10.2 milestone Feb 19, 2016
@tiborvass
Copy link
Contributor

I'm removing this from 1.10.2 as this changes a lot and we'd rather not add more risk than necessary to 1.10.2

@cpuguy83
Copy link
Member

Yeah, this change is definitely slower :(
The existing benchmark is only really testing the marshalling and not really the full write->read flow

benchmark                 old ns/op     new ns/op     delta
BenchmarkJSONLogger-8     2162          2938          +35.89%

benchmark                 old allocs     new allocs     delta
BenchmarkJSONLogger-8     7              9              +28.57%

benchmark                 old bytes     new bytes     delta
BenchmarkJSONLogger-8     245           336           +37.14%

@cpuguy83
Copy link
Member

And that's with the file poller too, not inotify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants