Skip to content

Add Path chooser. Add DiskMonitort multiple disks support#1

Merged
ObjatieGroba merged 66 commits intomasterfrom
feature_multiple_disks
Apr 27, 2020
Merged

Add Path chooser. Add DiskMonitort multiple disks support#1
ObjatieGroba merged 66 commits intomasterfrom
feature_multiple_disks

Conversation

@ObjatieGroba
Copy link
Copy Markdown
Owner

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature
  • Bug Fix
  • Improvement
  • Performance Improvement
  • Backward Incompatible Change
  • Build/Testing/Packaging Improvement
  • Other

Short description (up to few sentences):

...

Detailed description (optional):

...


String new_part_tmp_path = data.getFullPath() + TMP_PREFIX + future_part.name + "/";
size_t expected_size = future_part.expectedSize();
String part_path = data.getFullPathForPart(expected_size); ///@TODO_IGR ASK EXPECTED SIZE
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need constant

Copy link
Copy Markdown

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need tests and example of usage in documentation.



///@TODO_IGR ASK maybe pointer to Schema?
Schema Context::chooseSchema(const String & name) const
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not const reference?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose? We don't choose anything here. Just get element by key.

std::map<String, Disk> disks;
};

class Schema
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too common name. MergeTreeStorageSchema or DiskSchema?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shema and Volume would be renamed.
Nobody knows fine names for this classes

class ActiveDataPartSet
{
public:
struct PartPathName {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartPathWithName or something like this. Also not obvious why we need separate class and cannot get name with basename and path with dirname.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class ActiveDataPartSet? Or struct PartPathWithNames?


String relative_part_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name;
String absolute_part_path = data.getFullPath() + relative_part_path + "/";
auto reservation = data.reserveSpaceForPart(0); ///@TODO_IGR ASK What size should be there?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to add new http handler (or params to existing handler) which will tell about part size.

}

private:
DisksSelector disks;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need it?


checkDataPart(
storage.data.getFullPath() + part_name,
storage.data.getFullPaths()[0] + part_name, ///@TODO_IGR ASK what should we do there? Should we check all paths in checkDataPart?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should check part part_path.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we get part_path by part_name?
Or we shold chg queu by containig path in queue

{
current_parts.add(part->name);
virtual_parts.add(part->name);
current_parts.add("/", part->name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What purpose of this "/"? Seems like wrong usage.

{
MergeTreeMutationEntry entry(commands, full_path, data.insert_increment.get());
const auto full_paths = data.getFullPaths(); ///@TODO_IGR ASK What expected size of mutated part? what size should we reserve?
MergeTreeMutationEntry entry(commands, full_paths[0], data.insert_increment.get()); ///@TODO_IGR ASK PATH TO ENTRY
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you decide

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about expected size for such entry?

throw Exception("ReplicatedMergeTree storages require data path", ErrorCodes::INCORRECT_FILE_NAME);

///@TODO_IGR ASK Set inside MergeTreeData?
data.schema.setDefaultPath(path_);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, better in constructor.

res_columns[res_index++]->insert(tables_it->table()->getDataPath());
if (columns_mask[src_index++]) {
for (const String & path : tables_it->table()->getDataPaths() ) {
res_columns[res_index++]->insert(path); ///@TODO_IGR ASK Is it fine?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's full path, than it's ok.

ObjatieGroba pushed a commit that referenced this pull request Jun 9, 2019
ObjatieGroba pushed a commit that referenced this pull request Jun 9, 2019
SimpleAggregateFunction do not pass arena to the
add_function -> getAddressOfAddFunction(), hence next crash happens:
  (gdb) bt
  #0  DB::Arena::alloc (size=64, this=0x0) at ../dbms/src/Common/Arena.h:124
  #1  DB::SingleValueDataString::changeImpl (this=0x7f97424a27d8, value=..., arena=0x0) at ../dbms/src/AggregateFunctions/AggregateFunctionMinMaxAny.h:274
  #2  0x0000000005ea5319 in DB::AggregateFunctionNullUnary<true>::add (arena=<optimized out>, row_num=<optimized out>, columns=<optimized out>, place=<optimized out>, this=<optimized out>) at ../dbms/src/AggregateFunctions/AggregateFunctionNull.h:43
  #3  DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true> >::addFree (that=<optimized out>, place=<optimized out>, columns=<optimized out>, row_num=<optimized out>, arena=<optimized out>) at ../dbms/src/AggregateFunctions/IAggregateFunction.h:131
  #4  0x000000000679772f in DB::AggregatingSortedBlockInputStream::addRow (this=this@entry=0x7f982de19c00, cursor=...) at ../dbms/src/Common/AlignedBuffer.h:31
  ClickHouse#5  0x0000000006797faa in DB::AggregatingSortedBlockInputStream::merge (this=this@entry=0x7f982de19c00, merged_columns=..., queue=...) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:140
  ClickHouse#6  0x0000000006798979 in DB::AggregatingSortedBlockInputStream::readImpl (this=0x7f982de19c00) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:78
  ClickHouse#7  0x000000000622db55 in DB::IBlockInputStream::read (this=0x7f982de19c00) at ../dbms/src/DataStreams/IBlockInputStream.cpp:56
  ClickHouse#8  0x0000000006613bee in DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart (this=this@entry=0x7f97ec65e1a0, future_part=..., merge_entry=..., time_of_merge=<optimized out>, disk_reservation=<optimized out>, deduplicate=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1018
  ClickHouse#9  0x000000000658f7a4 in DB::StorageReplicatedMergeTree::tryExecuteMerge (this=0x7f97ec65b810, entry=...) at /usr/include/c++/8/bits/unique_ptr.h:342
  ClickHouse#10 0x00000000065940ab in DB::StorageReplicatedMergeTree::executeLogEntry (this=0x7f97ec65b810, entry=...) at ../dbms/src/Storages/StorageReplicatedMergeTree.cpp:910
  <snip>

  (gdb) f 1
  (gdb) p MAX_SMALL_STRING_SIZE
  $1 = 48
  (gdb) p capacity
  $2 = 64
  (gdb) p value
  $3 = {data = 0x7f97242fcbd0 "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH", size = 61}

v2: avoid leaking of allocated by Arena memory on the intermediate step

Fixes: 8f8d2c0 ("Merge pull request ClickHouse#4629 from bgranvea/simple_aggregate_function")
ObjatieGroba pushed a commit that referenced this pull request Jul 16, 2019
Enum data type as a synonim for Enum8 or Enum16
ObjatieGroba pushed a commit that referenced this pull request Jul 30, 2019
@ObjatieGroba ObjatieGroba merged commit 949890e into master Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants