Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Oct 28, 2019

Stack from ghstack:

FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

Differential Revision: D18187821

FileStore wasn't thread-safe. Maybe it wasn't necessarily intended
as such, but it's not documented as such, and we already have
unittests that use this store from multiple threads, and
occasionally crash as a result.

We can address this by simply throwing a small-scope std::mutex
around accesses to the std::unordered_map<> cache_.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
…sses."

FileStore wasn't thread-safe. Maybe it wasn't necessarily intended
as such, but it's not documented as such, and we already have
unittests that use this store from multiple threads, and
occasionally crash as a result.

We can address this by simply throwing a small-scope std::mutex
around accesses to the std::unordered_map<> cache_.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
…sses."

FileStore wasn't thread-safe. Maybe it wasn't necessarily intended
as such, but it's not documented as such, and we already have
unittests that use this store from multiple threads, and
occasionally crash as a result.

We can address this by simply throwing a small-scope std::mutex
around accesses to the std::unordered_map<> cache_.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 28, 2019
Pull Request resolved: #28812

FileStore wasn't thread-safe. Maybe it wasn't necessarily intended
as such, but it's not documented as such, and we already have
unittests that use this store from multiple threads, and
occasionally crash as a result.

We can address this by simply throwing a small-scope std::mutex
around accesses to the std::unordered_map<> cache_.
ghstack-source-id: 92787892

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used from multiple threads?

If we must make it thread safe I think there are more subtle things going on. The store keeps a file handle and an offset within that file that is already processed. If multiple threads try and refresh the store with the latest values (everything from the current offset to the end of the file), then that also must happen while holding a lock, or the threads will clobber the offset (pos_).

file,
pos_,
[this](const std::string& key, std::vector<uint8_t>&& value) {
insertCache(key, std::move(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous std::move?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, this wasn't superfluous - the compiler gets cranky without the extra std::move sprinkled in.

(Though this isn't needed in the new approach)

@pietern
Copy link
Contributor

pietern commented Oct 29, 2019

Where are those unittests that use it from multiple threads?

@jjlilley
Copy link
Author

Sorry, I should have been more clear; I meant to tidy a few things with this PE EOD yesterday.

I discovered yesterday that many of the [fb-only] ThriftRpcAgent tests are using FileStore in an unsafe manner (i.e. creating one of these, and handing to be used across two different threads, see the CommunicationManagerTest).

I forgot about pos_ here, the segfault I saw when running the tests in a loop with a bad unordered_map access.

LMK if you think it makes sense to write up a separate class if this interferes too much - though my concern is that even if we documen

@jjlilley
Copy link
Author

(my paste got cut off)
... even if we document the class to be unsafe, this won't be the last accidental unsafe use case.

…sses."

FileStore wasn't thread-safe. Maybe it wasn't necessarily intended
as such, but it's not documented as such, and we already have
unittests that use this store from multiple threads, and
occasionally crash as a result.

We can address this by simply throwing a small-scope std::mutex
around accesses to the std::unordered_map<> cache_.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
@jjlilley
Copy link
Author

Peter, I changed the locking approach to effective a semaphore around file operations.
LMK what you think of this approach.

The cv/mutex cost should be quite low in the context of file ops/file locking.

If you really don't think it's warranted in this class, I'm sort of ok with us just documenting the class as one dangerous to use with threads, though this might lead to misuse, or spraying coarser-grained locks (that are held across file syscalls) at the problem.

@pietern
Copy link
Contributor

pietern commented Oct 29, 2019

@jjlilley I took a look at the test you're referring to and it's unintended use. If there is some multi threaded test, you should be able to use an in process "store", which is nothing more than a map and a mutex (see for example https://github.com/facebookincubator/gloo/blob/master/gloo/rendezvous/hash_store.cc). No need to hit a file on disk in that case. I don't see a valid use case where you have both multiple processes and multiple threads that need access to the same store. Even if it were, I would instead give every thread its own store instance instead of reusing the same one.

Making this one thread safe doesn't hurt, though, but it shouldn't be a proxy for folks to assume everything is thread-safe. Also, using process groups must always be done from a single thread, because the calls must be issued in a deterministic order across all processes.

We can go forward with this change, but I recommend removing the RAII class and instead using a std::unique_lock<std::mutex> directly for simplicity (I don't see how this approach is different).

…sses."


FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
…sses."


FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
@jjlilley
Copy link
Author

jjlilley commented Oct 29, 2019

Thanks for your response, Pieter.

  • Since the usage really is unintended, I did send out a diff to fix the usage downstream, i.e. D18209221
    If there are unintended usage patterns checked in, they will be copied-and-pasted into new code as well.
  • Sorry about the silly condvar approach; I started coding up a more elaborate use-case (with interrupt), that ended up being overkill, and I didn't scale back all the way.
  • I did just upload a version with std::unique_lock<>, if we want this to be thread-safe.
    Feel free to push back if you don't want it (in which case we just check in a comment that this class isn't thread-safe), though the lock should be fairly cheap compared with the file costs here, and this might protect against some future failures.

…sses."


FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
…sses."


FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 29, 2019
Pull Request resolved: #28812

FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

ghstack-source-id: 92874098

Differential Revision: [D18187821](https://our.internmc.facebook.com/intern/diff/D18187821/)
@jjlilley jjlilley requested a review from pietern October 30, 2019 03:34
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks for the update @jjlilley!

In this diff I see the mutex is locked at the same time as the file is opened. We could make this even simpler by locking the mutex from the file constructor, e.g. by passing the mutex as ref.

@jjlilley
Copy link
Author

Thanks for the review.
The one issue with putting in the File constructor is that it really needs to be propagated/moved to the resulting Lock, since we want the early unlock for the FileStore::get() call.

Maybe I'll submit as-is, and will send a quick [optional] followup? It's not clear 100% to me from squinting at the code, which seems more elegant, because of the needed interaction with Lock.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 331e09e.

@pietern
Copy link
Contributor

pietern commented Oct 31, 2019

You're right, it would make it more convoluted.

@facebook-github-bot facebook-github-bot deleted the gh/jjlilley/15/head branch November 3, 2019 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants