Skip to content

MaterializedPostgreSQL: allow dynamically adding/deleting tables, altering settings#28301

Merged
alesapin merged 18 commits intoClickHouse:masterfrom
kssenii:materialized-postgresql
Sep 20, 2021
Merged

MaterializedPostgreSQL: allow dynamically adding/deleting tables, altering settings#28301
alesapin merged 18 commits intoClickHouse:masterfrom
kssenii:materialized-postgresql

Conversation

@kssenii
Copy link
Copy Markdown
Member

@kssenii kssenii commented Aug 28, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support adding / deleting tables to replication from PostgreSQL dynamically in database engine MaterializedPostgreSQL. Support alter for database settings. Closes #27573.

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Aug 28, 2021
@alesapin alesapin self-assigned this Sep 2, 2021
Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix potential race conditions.

String path_to_table_symlinks;
String path_to_metadata_symlink;
const UUID db_uuid;
ASTPtr storage_def;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why database atomic now have storage? What is it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to delete..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, no, it is already deleted..


virtual void modifySettingsMetadata(const SettingsChanges &, ContextPtr)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Database engine {} does not support settings", getEngineName());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't support settings alter?

throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Database engine {} does not support settings", getEngineName());
}

virtual void tryApplySettings(const SettingsChanges &, ContextPtr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this method and previous? how modifySettingsMetadata differs from tryApplySettings? BTW, tryXXX usually returns bool.

const String & metadata_path_,
UUID uuid_,
const ASTStorage * database_engine_define_,
ASTPtr storage_def_,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database_engine_define_ is much more clear, than storage_def.

}


void DatabaseMaterializedPostgreSQL::tryApplySettings(const SettingsChanges & settings_changes, ContextPtr local_context)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_context -> query_context?

}


String DatabaseMaterializedPostgreSQL::getTablesList(const String & except) const
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear method interface. Method return string, not list. What is except -- it's also comma-separated string?

if (!alter_commands.empty())
{
database->checkAlterIsPossible(alter_commands, getContext());
alter_commands.apply(database, getContext());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to leave AlterCommands as lightweight class which operates on AST's or some simple structures like StorageInMemoryMetadata. Taking IDatabase as argument is not the best solution.

{
if (command.type == AlterCommand::MODIFY_DATABASE_SETTING)
{
database->tryApplySettings(command.settings_changes, context);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if database will be dropped here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the order of writing settings changes to disk and writing to postgresql replication flow.
Then if we crash in between, it will be Ok for this database engine because it always checks metadata on startup and makes according modifications. Though cannot say the same for other engines if they will also support settings changes..

: log(&Poco::Logger::get("PostgreSQLReaplicaConsumer"))
Storages storages_,
const String & name_for_logger)
: log(&Poco::Logger::get("PostgreSQLReplicaConsumer("+ name_for_logger +")"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: log(&Poco::Logger::get("PostgreSQLReplicaConsumer("+ name_for_logger +")"))
: log(&Poco::Logger::get("PostgreSQLReplicaConsumer(" + name_for_logger + ")"))

Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alter part is almost ok (expect overwrite), locking also ok. Barely understand postgres replication part, but looks like we create too many connections. Maybe we can have one shared connection for handler?

#include <Databases/DatabaseOnDisk.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <IO/WriteBufferFromFile.h>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (!query_context->isInternalQuery())
throw Exception(ErrorCodes::QUERY_NOT_ALLOWED, "Changing setting `{}` is not allowed", change.name);

DatabaseOnDisk::modifySettingsMetadata(settings_changes, query_context);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we changed 10 settings at one query? We will override whole metadata on disk 10 times?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{
/// If there is query context then we need to detach materialized storage.
/// If there is no query context then we need to detach internal storage from atomic database.
if (CurrentThread::isInitialized() && CurrentThread::get().getQueryContext())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to a separate method?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it is better here. It is rather compact.

try
{
nested_storages[table_name] = loadFromSnapshot(snapshot_name, table_name, storage->as <StorageMaterializedPostgreSQL>());
nested_storages[table_name] = loadFromSnapshot(tmp_connection, snapshot_name, table_name, storage->as <StorageMaterializedPostgreSQL>());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nested_storages[table_name] = loadFromSnapshot(tmp_connection, snapshot_name, table_name, storage->as <StorageMaterializedPostgreSQL>());
nested_storages[table_name] = loadFromSnapshot(tmp_connection, snapshot_name, table_name, storage->as<StorageMaterializedPostgreSQL>());

StoragePtr PostgreSQLReplicationHandler::loadFromSnapshot(String & snapshot_name, const String & table_name,
ASTPtr PostgreSQLReplicationHandler::getCreateNestedTableQuery(StorageMaterializedPostgreSQL * storage, const String & table_name)
{
postgres::Connection connection(connection_info);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see pqlib (and pqxx) for the first time. All connections will be correctly closed in destructors?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it ok to create new connection each time?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see pqlib (and pqxx) for the first time. All connections will be correctly closed in destructors?

yes, they will be closed.

BTW, is it ok to create new connection each time?

Yes, quite bad.
I cannot make one global connection for all methods in this replication handler becuase it is dangerous as they can be called concurrently.
A solution could be to add a connection pool here. But I am not sure it is for the better because all these connections are opened only while initial tables dump or when some non ordinary rare actions are made.
Overall in normal lifestyle of this engine (after initial synchronization is completed) there will be only 1 active connection at a time.

auto * materialized_storage = storage->as <StorageMaterializedPostgreSQL>();
try
{
/// FIXME: Looks like it is possible we might get here if there is no nested storage or at least nested storage id field might be empty.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very scary.

Copy link
Copy Markdown
Member Author

@kssenii kssenii Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very scary because "nested storage id might be empty" means not that there is no StorageID in some storage, but that a field nested_table_id in MaterializedPostgreSQL storage is std::nullopt.
Is is set in 2 cases:

  1. At constructor -- if MaterializedPostgreSQL storage is created with a ready fully made nested storage passed into its constrcutor.
  2. in storage->prepare() call -- right after materialized storage has created and saved nested storage inside itself.

I'll add this to a comment.

No I changed my mind..

Very scary.

Yes, it is scary.
(if it will happen, but I (want to) believe its not possible)


auto nested = table_to_delete->as<StorageMaterializedPostgreSQL>()->getNested();
if (!nested)
throw Exception(ErrorCodes::UNKNOWN_TABLE, "Inner table `{}` does not exist", table_name);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like logical error? Or it can be dropped manually?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it ok to create new connection each time?

yes, fixed.

Or it can be dropped manually?

no.

{
auto current_context = Context::createCopy(getContext()->getGlobalContext());
current_context->makeQueryContext();
DatabaseAtomic::dropTable(current_context, table_name, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why we drop it in detach method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because postgres table can be removed from replication via detach only -- for consistency with attach (because we cannot add table to replication via create table because we do not know table structure).
So we call detach for materialized table and drop for inner table because this is the behaviour I need.


try
{
auto tables_to_replicate = settings->materialized_postgresql_tables_list.value;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where the source of truth? Which tables we have in materialized_postgresql_tables_list which tables we have in materialized_tables?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where the source of truth?

Sounds so philosophically...

p.s. discussed in dm that I need to add more comments.

/// Delete data and metadata stored inside the database, if exists.
virtual void drop(ContextPtr /*context*/) {}

virtual void applyNewSettings(const SettingsChanges &, ContextPtr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applySettingsChanges?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Copy Markdown
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support dynamically adding tables to MaterializedPostgreSQL

3 participants