Skip to content

Feature multiple disks#4918

Merged
alesapin merged 66 commits intoClickHouse:masterfrom
ObjatieGroba:feature_multiple_disks
Sep 13, 2019
Merged

Feature multiple disks#4918
alesapin merged 66 commits intoClickHouse:masterfrom
ObjatieGroba:feature_multiple_disks

Conversation

@ObjatieGroba
Copy link
Copy Markdown
Contributor

@ObjatieGroba ObjatieGroba commented Apr 5, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description (up to few sentences):

Add multiple disks settings for MergeTree (#3605)

Detailed description (optional):

Add configuration setting contains disks choosing algorithm.
Volume is set of disks (round-robin)
Schema is sequence of Volumes which prioritized paths to store


String relative_part_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name;
String absolute_part_path = data.getFullPath() + relative_part_path + "/";
auto reservation = data.reserveSpaceForPart(0); ///@TODO_IGR ASK What size should be there?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should StorageMergeTreeLogEntry contain DataPart size?


if (disk_name == default_disk_name) {
if (!path.empty()) {
///@TODO_IGR ASK Rename Default disk to smth? ClickHouse disk? DB disk?
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are "default" disk and "default" schema.
It is fine name for schema.
What abut disk?


void addEnclosedDirToPath(const String & dir)
{
path += dir + '/';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for proper method in Poco.

String all_paths;
for (const String & path : paths)
all_paths += path + ';';
///@TODO IGR choose separator
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Array(String) instead.

#include <Poco/File.h>

/// @TODO_IGR ASK Does such function already exists?
bool isAlphaNumeric(const std::string & s)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
has_default_disk = true;
if (!path.empty())
throw Exception("It is not possible to specify default disk path", ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG);
Copy link
Copy Markdown
Contributor

@filimonov filimonov May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make exception more helpful - something like:
"default" disk path should be provided in <path> not it <storage_configuration>

}
UInt64 sum_size = 0;
for (const auto & disk : disks)
sum_size += disk->getTotalSpace();
Copy link
Copy Markdown
Contributor

@filimonov filimonov May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit controversial. Lets say you have 2 equal disks & set up ratio to 0.6. That part will never fit any of the disks. May be it should be calculated as related to maximum / minimum disk size from volume?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe median or average of disk sizes?

Copy link
Copy Markdown
Contributor

@filimonov filimonov May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it should not change significantly when new disks are added. I.e. if before adding new disk max calculated part size was 100Mb, i would expect that allowed partsize for that volume still remains unchanged even after adding new bigger disk. That sounds like min * ratio.

From the other hand if small disk will be added to volume of bigger disks it will create problems :)

Copy link
Copy Markdown
Contributor

@filimonov filimonov May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me that usecase (with ratio) is a bit non clear. May be we can leave it as is (with sum) but introduce some extra check which will guarantee that each disk in volume can store at least few parts of that size?

Copy link
Copy Markdown
Contributor Author

@ObjatieGroba ObjatieGroba May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me that usecase (with ratio) is a bit non clear. May be we can leave it as is (with sum) but introduce some extra check which will guarantee that each disk in volume can store at least few parts of that size?

I think we should use average. Ratio should be from 0 to inf. It would not create reservation on disk if part greater than disk can store (current available space).

Let's make warning for disks which TotalSpace - keep_free_space < max_data_part_size_bytes

ParserKeyword s_table("TABLE");
ParserKeyword s_move_part("MOVE PART");
ParserKeyword s_to("TO");
ParserToken s_comma(TokenType::Comma);
Copy link
Copy Markdown
Contributor

@filimonov filimonov Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO introducing completely separate command TABLE xyz MOVE PART ... TO ... is bad idea.

It should be a part of ALTER TABLE, i.e. ALTER TABLE xyz MOVE PART ... TO.

Motivation:

  1. all sql statements should start from the verb. TABLE is not a verb. MOVE TABLE is not an option (we don't move any tables there), MOVE PARTITION OF TABLE zzzz - looks bad.
  2. clickhouse already have ALTERs which (a) don't alter anything (b) executed one-replica only (w/o disturbing others) - see fetch partition / freeze partition.
  3. other RDBMS also use ALTER statements for that. MS SQL: ALTER DATABASE xxxx MODIFY FILE , Oracle: ALTER DATABASE RENAME FILE etc.

@ObjatieGroba ObjatieGroba marked this pull request as ready for review June 19, 2019 17:56
@nvartolomei nvartolomei mentioned this pull request Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants