Skip to content

Add mutation support for StorageMemory#15127

Merged
alexey-milovidov merged 17 commits intoClickHouse:masterfrom
ucasfl:add-mutation-for-storagememory
Nov 26, 2020
Merged

Add mutation support for StorageMemory#15127
alexey-milovidov merged 17 commits intoClickHouse:masterfrom
ucasfl:add-mutation-for-storagememory

Conversation

@ucasfl
Copy link
Copy Markdown
Collaborator

@ucasfl ucasfl commented Sep 22, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

fix #9117

Detailed description / Documentation draft:

...

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Sep 22, 2020
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from 322d4dc to ec683fe Compare September 22, 2020 10:42
fix

fix
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from ec683fe to c0b5f84 Compare September 22, 2020 12:10
@Akazz Akazz self-assigned this Sep 22, 2020
@Akazz
Copy link
Copy Markdown
Contributor

Akazz commented Sep 29, 2020

Hey there, @ucasfl!
Currently StorageMemory uses the following mechanisms for resolving possible conflicts between concurrent operations:

  • SELECTs vs DROP/TRUNCATEs are resolved via table-level locks (lockForShare() vs lockExclusively())
  • SELECTs vs INSERTs and INSERTs vs DROP/TRUNCATEs are resolved via internal mutex

Implementing mutate() method would be very neat and we really appreciate your effort, but what you currently suggest in this PR is just not enough. And that is due to the fact that the current code cannot be easily extended to support efficient resolving possible conflicts of type MUTATEs vs <anything else>. One would either need to copy the whole list of blocks (in other words to implement some sort of versioned storage), or to put in some blocking synchronization with exclusiveLock(), and the latter is much less desirable.

@alexey-milovidov
Copy link
Copy Markdown
Member

We have discussed it and the variant with copying the list of blocks to readers looks Ok.

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Oct 1, 2020

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?@alexey-milovidov

@alexey-milovidov
Copy link
Copy Markdown
Member

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?

Yes, there will be no table-level locks inside StorageMemory.
We only need a mutex, under which we will copy the list of blocks.

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Oct 4, 2020

Hi, I have added MultiVersion storage in StorageMemory, you can take a look at it if you have time. @Akazz @alexey-milovidov

Copy link
Copy Markdown
Contributor

@Akazz Akazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to exist a race condition when concurrent INSERTs each try to add new blocks to their own copy of data.

Comment on lines +111 to +113
auto new_data = std::make_unique<BlocksList>(*(storage.data.get()));
new_data->push_back(block);
storage.data.set(std::move(new_data));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is going to be a problem due to a race condition here. Because we do not employ any CAS technique here, data can be lost when concurrent INSERTs each add a block to its own copy of data.

Copy link
Copy Markdown
Collaborator Author

@ucasfl ucasfl Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in order to solve this problem, we still need a table-level lock? @Akazz

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Oct 8, 2020

If implement multiversion storage, do we still need table-level locks for storagememory? It doesn't seem to be needed anymore?

Yes, there will be no table-level locks inside StorageMemory.
We only need a mutex, under which we will copy the list of blocks.

It seems that we still need a table-level lock to resolve concurrent mutation/insert. @alexey-milovidov

fix
@ucasfl ucasfl force-pushed the add-mutation-for-storagememory branch from f7dab0b to 677787f Compare October 8, 2020 13:37
void StorageMemory::truncate(
const ASTPtr &, const StorageMetadataPtr &, const Context &, TableExclusiveLockHolder &)
{
std::lock_guard lock(mutex);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like mutex is not needed anymore?

@@ -102,7 +106,9 @@ class MemoryBlockOutputStream : public IBlockOutputStream
metadata_snapshot->check(block, true);
{
std::lock_guard lock(storage.mutex);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the mutex also looks unneeded.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need it to resolve concurrent insert, mutation, drop...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure... Concurrent INSERT and mutation is Ok: mutation will mutate previously inserted data and concurrently inserted data will be left unmutated.

For DROP there is table-level lock: IStorage::lockExclusively.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a test for race-condition (you can find similar .sh tests).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have race-condition test 01454_storagememory_data_race_challenge.sh.

@@ -201,17 +208,83 @@ BlockOutputStreamPtr StorageMemory::write(const ASTPtr & /*query*/, const Storag
void StorageMemory::drop()
{
std::lock_guard lock(mutex);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here mutex also looks unneeded.

auto new_data = std::make_unique<BlocksList>(*(data.get()));
auto data_it = new_data->begin();
auto out_it = out.begin();
while (data_it != new_data->end() && out_it != out.end())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that data has the same number of blocks?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify in comments that deletion of whole blocks cannot happen here.

rows += buffer.rows();
bytes += buffer.bytes();
}
data.set(std::make_unique<BlocksList>(out));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move calculation of rows/bytes and assignment out of the branch to avoid code duplication.

void StorageMemory::mutate(const MutationCommands & commands, const Context & context)
{
std::lock_guard lock(mutex);
auto metadata_snapshot = getInMemoryMetadataPtr();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test checking that ALTER ADD/REMOVE column is not supported right now (that is Ok).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a few changes remain.

@alexey-milovidov
Copy link
Copy Markdown
Member

Alright.

@alexey-milovidov alexey-milovidov merged commit 1cd09fa into ClickHouse:master Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Mutations in storage Memory.

4 participants