Skip to content

Conversation

@giorio94
Copy link

@giorio94 giorio94 commented Nov 8, 2024

The blamed commit introduced a check to prevent sending a delete event in case the parent is being watched as well. However, it accesses the watches.path map without synchronization, possibly leading to a data race in case the same map is accessed in parallel (e.g., by an Add() operation). Let's fix this by grabbing the associated lock before reading from the map.

Excerpt of the data race trace:

Write at 0x00c0004d8cf0 by goroutine 33:
  runtime.mapassign_faststr()
      golang.org/[email protected]/src/runtime/map_faststr.go:223 +0x0
  github.com/fsnotify/fsnotify.(*watches).updatePath()
      github.com/fsnotify/fsnotify/backend_inotify.go:163 +0x2b8
  github.com/fsnotify/fsnotify.(*inotify).register()
      github.com/fsnotify/fsnotify/backend_inotify.go:335 +0x1bd
  github.com/fsnotify/fsnotify.(*inotify).add()
      github.com/fsnotify/fsnotify/backend_inotify.go:331 +0xee
  github.com/fsnotify/fsnotify.(*inotify).AddWith()
      github.com/fsnotify/fsnotify/backend_inotify.go:296 +0x405
  github.com/fsnotify/fsnotify.(*inotify).Add()
      github.com/fsnotify/fsnotify/backend_inotify.go:253 +0x44
  github.com/fsnotify/fsnotify.(*Watcher).Add()
      github.com/fsnotify/fsnotify/fsnotify.go:313 +0x5ce

Previous read at 0x00c0004d8cf0 by goroutine 32:
  runtime.mapaccess2_faststr()
      golang.org/[email protected]/src/runtime/map_faststr.go:117 +0x0
  github.com/fsnotify/fsnotify.(*inotify).readEvents()
      github.com/fsnotify/fsnotify/backend_inotify.go:534 +0xd04
  github.com/fsnotify/fsnotify.newBufferedBackend.gowrap1()
      github.com/fsnotify/fsnotify/backend_inotify.go:195 +0x33

Fixes: e75779f ("inotify: don't send event for IN_DELETE_SELF when also watching the parent (#620)")


Additionally, the processing of IN_DELETE_SELF events is currently affected by a second possible data race in case of parallel Add() operations, because it accesses the wd field of the current watch without synchronization, although it may be potentially mutated in parallel after having added a new watcher. Let's fix this by making sure that the field is only accessed inside the remove function, after having grabbed the lock.

Excerpt of the data race trace:

Write at 0x00c00007e600 by goroutine 10:
  github.com/fsnotify/fsnotify.(*inotify).add.(*inotify).register.func1()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:354 +0x1f4
  github.com/fsnotify/fsnotify.(*watches).updatePath()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:157 +0x18c
  github.com/fsnotify/fsnotify.(*inotify).register()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:335 +0x1bd
  github.com/fsnotify/fsnotify.(*inotify).add()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:331 +0xee
  github.com/fsnotify/fsnotify.(*inotify).AddWith()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:296 +0x405
  github.com/fsnotify/fsnotify.(*inotify).Add()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:253 +0x44
  github.com/fsnotify/fsnotify.(*Watcher).Add()
      github.com/fsnotify/other/fsnotify/fsnotify.go:313 +0x146
  github.com/fsnotify/fsnotify.addWatch()
      github.com/fsnotify/other/fsnotify/helpers_test.go:73 +0xd1
  github.com/fsnotify/fsnotify.TestInDeleteSelfRace()
      github.com/fsnotify/other/fsnotify/backend_inotify_test.go:191 +0xa5b
  testing.tRunner()
      /usr/lib/go-1.22/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go-1.22/src/testing/testing.go:1742 +0x44

Previous read at 0x00c00007e600 by goroutine 11:
  github.com/fsnotify/fsnotify.(*inotify).readEvents()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:511 +0xa38
  github.com/fsnotify/fsnotify.newBufferedBackend.gowrap1()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:195 +0x33

@giorio94 giorio94 marked this pull request as ready for review November 8, 2024 16:40
@giorio94
Copy link
Author

giorio94 commented Nov 8, 2024

The FreeBSD failure looks unrelated AFAICT:

--- FAIL: TestRemoveRace (0.64s)
    helpers_test.go:435: bad file descriptor

@arp242
Copy link
Member

arp242 commented Nov 11, 2024

It would be nice if this had a test, so it won't be regressed in the future.

@gqcn
Copy link

gqcn commented Nov 12, 2024

Same issue, hope it resolved ASAP.

@giorio94
Copy link
Author

It would be nice if this had a test, so it won't be regressed in the future.

Hmm, makes sense in principle to me. My main doubt here was that, being a race condition, it is a bit tricky to trigger in a reproducible way, and the test may result a bit specific to the current implementation.

That said, the simplest test I could get to consistently reproduce the race, at least on my machine, is something along the lines of (which is not exactly minimal):

func TestInDeleteSelfRace(t *testing.T) {
	t.Parallel()

	var (
		wg    sync.WaitGroup
		stop  = make(chan struct{})
		tries = 25
	)

	tmp := t.TempDir()
	touch(t, tmp, "other", noWait)

	w := newCollector(t, tmp)
	touch(t, tmp, "target", noWait)

	wg.Add(1)
	defer wg.Wait()

	go func() {
		defer wg.Done()
		for {
			select {
			case <-stop:
				return
			default:
				addWatch(t, w.w, tmp, "other")
			}
		}
	}()

	for {
		select {
		case err, ok := <-w.w.Errors:
			if !ok {
				return
			}

			t.Error(err)

		case ev, ok := <-w.w.Events:
			if !ok {
				return
			}

			if ev.Has(Create) {
				rm(t, tmp, "target", noWait)
			}

			if ev.Has(Remove) {
				touch(t, tmp, "target", noWait)
				addWatch(t, w.w, tmp, "target")
				tries--
			}

			if tries == 0 {
				close(stop)
				if err := w.w.Close(); err != nil {
					t.Error(err)
				}
			}
		}
	}
}

Would something like that make sense to you? In that case, I'll add it to the PR.

@arp242
Copy link
Member

arp242 commented Nov 14, 2024

Usually writing a test isn't too hard: just spawn a bunch of goroutines and run go test -race. As long as you have sufficient goroutines it should be a 100% repro rate.

As long as that test reproduces it reliably (before your change) it's fine.

In preparation for the subsequent commits, let's add a unit test showing
the possibility of race conditions when Add() operations are performed
in parallel to the reception of an IN_DELETE_SELF event.

`go test . --run '^TestInDeleteSelfRace$' --race` currently fails with:

==================
WARNING: DATA RACE
Write at 0x00c0000b6720 by goroutine 9:
  runtime.mapassign_faststr()
      /usr/lib/go-1.22/src/runtime/map_faststr.go:203 +0x0
  github.com/fsnotify/fsnotify.(*watches).updatePath()
      github.com/fsnotify/fsnotify/backend_inotify.go:163 +0x2b8
  github.com/fsnotify/fsnotify.(*inotify).register()
      github.com/fsnotify/fsnotify/backend_inotify.go:335 +0x1bd
  github.com/fsnotify/fsnotify.(*inotify).add()
      github.com/fsnotify/fsnotify/backend_inotify.go:331 +0xee
  github.com/fsnotify/fsnotify.(*inotify).AddWith()
      github.com/fsnotify/fsnotify/backend_inotify.go:296 +0x405
  github.com/fsnotify/fsnotify.(*inotify).Add()
      github.com/fsnotify/fsnotify/backend_inotify.go:253 +0x44
  github.com/fsnotify/fsnotify.(*Watcher).Add()
      github.com/fsnotify/fsnotify/fsnotify.go:313 +0x146
  github.com/fsnotify/fsnotify.addWatch()
      github.com/fsnotify/fsnotify/helpers_test.go:73 +0xd1
  github.com/fsnotify/fsnotify.TestInDeleteSelfRace.func1()
      github.com/fsnotify/fsnotify/backend_inotify_test.go:166 +0x111

Previous read at 0x00c0000b6720 by goroutine 8:
  runtime.mapaccess2_faststr()
      /usr/lib/go-1.22/src/runtime/map_faststr.go:108 +0x0
  github.com/fsnotify/fsnotify.(*inotify).readEvents()
      github.com/fsnotify/fsnotify/backend_inotify.go:534 +0xd04
  github.com/fsnotify/fsnotify.newBufferedBackend.gowrap1()
      github.com/fsnotify/fsnotify/backend_inotify.go:195 +0x33

Goroutine 9 (running) created at:
  github.com/fsnotify/fsnotify.TestInDeleteSelfRace()
      github.com/fsnotify/fsnotify/backend_inotify_test.go:159 +0x5d3
  testing.tRunner()
      /usr/lib/go-1.22/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go-1.22/src/testing/testing.go:1742 +0x44

Goroutine 8 (running) created at:
  github.com/fsnotify/fsnotify.newBufferedBackend()
      github.com/fsnotify/fsnotify/backend_inotify.go:195 +0x3c6
  github.com/fsnotify/fsnotify.newBackend()
      github.com/fsnotify/fsnotify/backend_inotify.go:174 +0x51
  github.com/fsnotify/fsnotify.NewWatcher()
      github.com/fsnotify/fsnotify/fsnotify.go:253 +0x42
  github.com/fsnotify/fsnotify.newWatcher()
      github.com/fsnotify/fsnotify/helpers_test.go:53 +0x84
  github.com/fsnotify/fsnotify.newCollector()
      github.com/fsnotify/fsnotify/helpers_test.go:382 +0x1de
  github.com/fsnotify/fsnotify.TestInDeleteSelfRace()
      github.com/fsnotify/fsnotify/backend_inotify_test.go:155 +0x18d
  testing.tRunner()
      /usr/lib/go-1.22/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go-1.22/src/testing/testing.go:1742 +0x44
==================
--- FAIL: TestInDeleteSelfRace (0.02s)
    testing.go:1398: race detected during execution of test
FAIL
FAIL    github.com/fsnotify/fsnotify    0.078s
FAIL
The blamed commit introduced a check to prevent sending a delete event
in case the parent is being watched as well. However, it accesses the
`watches.path` map without synchronization, possibly leading to a data
race in case the same map is accessed in parallel (e.g., by an `Add()`
operation). Let's fix this by grabbing the associated lock before
reading from the map.

Excerpt of the data race trace:

Write at 0x00c0004d8cf0 by goroutine 33:
  runtime.mapassign_faststr()
      golang.org/[email protected]/src/runtime/map_faststr.go:223 +0x0
  github.com/fsnotify/fsnotify.(*watches).updatePath()
      github.com/fsnotify/fsnotify/backend_inotify.go:163 +0x2b8
  github.com/fsnotify/fsnotify.(*inotify).register()
      github.com/fsnotify/fsnotify/backend_inotify.go:335 +0x1bd
  github.com/fsnotify/fsnotify.(*inotify).add()
      github.com/fsnotify/fsnotify/backend_inotify.go:331 +0xee
  github.com/fsnotify/fsnotify.(*inotify).AddWith()
      github.com/fsnotify/fsnotify/backend_inotify.go:296 +0x405
  github.com/fsnotify/fsnotify.(*inotify).Add()
      github.com/fsnotify/fsnotify/backend_inotify.go:253 +0x44
  github.com/fsnotify/fsnotify.(*Watcher).Add()
      github.com/fsnotify/fsnotify/fsnotify.go:313 +0x5ce

Previous read at 0x00c0004d8cf0 by goroutine 32:
  runtime.mapaccess2_faststr()
      golang.org/[email protected]/src/runtime/map_faststr.go:117 +0x0
  github.com/fsnotify/fsnotify.(*inotify).readEvents()
      github.com/fsnotify/fsnotify/backend_inotify.go:534 +0xd04
  github.com/fsnotify/fsnotify.newBufferedBackend.gowrap1()
      github.com/fsnotify/fsnotify/backend_inotify.go:195 +0x33

Fixes: e75779f ("inotify: don't send event for IN_DELETE_SELF when also watching the parent (fsnotify#620)")
The processing of IN_DELETE_SELF events is currently affected by a
possible data race in case of parallel Add() operations, because it
accesses the `wd` field of the current watch without synchronization,
although it may be potentially mutated in parallel after having added
a new watcher. Let's fix this by making sure that the field is only
accessed inside the remove function, after having grabbed the lock.

Excerpt of the data race trace:

Write at 0x00c00007e600 by goroutine 10:
  github.com/fsnotify/fsnotify.(*inotify).add.(*inotify).register.func1()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:354 +0x1f4
  github.com/fsnotify/fsnotify.(*watches).updatePath()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:157 +0x18c
  github.com/fsnotify/fsnotify.(*inotify).register()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:335 +0x1bd
  github.com/fsnotify/fsnotify.(*inotify).add()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:331 +0xee
  github.com/fsnotify/fsnotify.(*inotify).AddWith()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:296 +0x405
  github.com/fsnotify/fsnotify.(*inotify).Add()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:253 +0x44
  github.com/fsnotify/fsnotify.(*Watcher).Add()
      github.com/fsnotify/other/fsnotify/fsnotify.go:313 +0x146
  github.com/fsnotify/fsnotify.addWatch()
      github.com/fsnotify/other/fsnotify/helpers_test.go:73 +0xd1
  github.com/fsnotify/fsnotify.TestInDeleteSelfRace()
      github.com/fsnotify/other/fsnotify/backend_inotify_test.go:191 +0xa5b
  testing.tRunner()
      /usr/lib/go-1.22/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/lib/go-1.22/src/testing/testing.go:1742 +0x44

Previous read at 0x00c00007e600 by goroutine 11:
  github.com/fsnotify/fsnotify.(*inotify).readEvents()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:511 +0xa38
  github.com/fsnotify/fsnotify.newBufferedBackend.gowrap1()
      github.com/fsnotify/other/fsnotify/backend_inotify.go:195 +0x33
@giorio94 giorio94 force-pushed the pr/giorio94/fix-data-race branch from 3344a8e to 7c91a4e Compare November 14, 2024 14:27
@giorio94 giorio94 changed the title inotify: fix potential data race on IN_DELETE_SELF event inotify: fix potential data races on IN_DELETE_SELF event Nov 14, 2024
@giorio94
Copy link
Author

Pushed the addition of the test. Moreover, the last commit fixes a second data race that got revealed by the test as well.

@giorio94
Copy link
Author

@arp242 Gentle ping

@ugexe
Copy link

ugexe commented Feb 20, 2025

We're seeing this same issue and would like to see this PR move forward

arp242 added a commit that referenced this pull request Mar 20, 2025
It would access "watch.wd", but that's racy. I redid quite a bit of the
inotify backend to add some new features, so regression from one of
those changes (and/or would work "by accident" before).

Fixes #653
Fixes #664
Fixes #672
arp242 added a commit that referenced this pull request Mar 21, 2025
It would access "watch.wd", but that's racy. I redid quite a bit of the
inotify backend to add some new features, so regression from one of
those changes (and/or would work "by accident" before).

Fixes #653
Fixes #664
Fixes #672
arp242 added a commit that referenced this pull request Mar 28, 2025
It would access "watch.wd", but that's racy. I redid quite a bit of the
inotify backend to add some new features, so regression from one of
those changes (and/or would work "by accident" before).

Fixes #653
Fixes #664
Fixes #672
@arp242 arp242 closed this in #678 Mar 31, 2025
@arp242 arp242 closed this in 9a268ac Mar 31, 2025
jakubno pushed a commit to e2b-dev/fsnotify that referenced this pull request Jul 31, 2025
It would access "watch.wd", but that's racy. I redid quite a bit of the
inotify backend to add some new features, so regression from one of
those changes (and/or would work "by accident" before).

Fixes fsnotify#653
Fixes fsnotify#664
Fixes fsnotify#672
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants