Skip to content

Conversation

@dongxiaoman
Copy link
Contributor

@dongxiaoman dongxiaoman commented May 16, 2022

Description

Following changes in #5734,
Allows table name with dots by a shared pinot property. This will allow table name format like namespace.table_name to avoid table_name conflict while providing some mitigation/middle ground before Pinot table goes true multi-tenant.

The use case this one PR enabled will be for some users sharing a same Pinot Cluster, with the below Example rules (not in Pinot, but only with the wrapper/UI or some toolset visible to user):

  1. The Admin enforces the rule that the users cannot use dots in their own table_name like test;
  2. The Admin also enforces the rule that the users will be given a unique "namespace" as prefix to table;
  3. The users can create a table named test and write query freely/naturally with table name like user_namespace.test
  4. Pinot can delegate the authorization of table access via its plugin properly in the query

Extended feature

Extract column name only after the last dot in cases like SELECT db.namespace.table.column_name from db.namespace.table. In the example the table name will be db.namespace.table

Tests Done

Many Unit tests

Release Notes

  • Added pinot.table.name.dots.enabled config to Controllers and Brokers to allow table name with dots.
  • Extract column name from queries like "SELECT db.namespace.table.column_name" properly

@apucher
Copy link
Contributor

apucher commented May 17, 2022

IMO this doesn't even need a feature flag. We're not breaking any behavior. Keeps things simple.

@Jackie-Jiang
Copy link
Contributor

@apucher Good point. Let me list down the things we are trying to handle with . and discuss a solution:

  • SELECT ... FROM <db>.<table>
  • SELECT <table>.<col> FROM ...
    In the future, we also want to support:
  • SELECT <col>.<nested_fields> FROM ...

If we allow . within the table name, we cannot tell if . is the delimiter or part of the table name.
Add @xiangfu0 to the discussion

@apucher
Copy link
Contributor

apucher commented May 17, 2022

it should be easy to parse the table name since it needs to be present in the FROM clause.

are nested field selects a thing yet?

@Jackie-Jiang
Copy link
Contributor

it should be easy to parse the table name since it needs to be present in the FROM clause.

I think the <db>.<table> convention is from the connector, and we don't have the database concept in Pinot. When getting it from the FROM clause, we want to match it with the tables available in pinot. If we support dot in the table name, it will make it slightly hard to parse the name. E.g. when getting a.b.c.d.e, we won't know which part is the table name.

are nested field selects a thing yet?

We don't support it yet, but it is in our roadmap, so we should take them into consideration

I think using dot in table name (or any field name) is in general bad practice because dot is already reserved for other purpose. @dongxiaoman I think we should just use _ instead of . for multi-tenant purpose. To avoid name conflict, we can reject namespace with underlying _ or perform a prefix check when adding a new namespace.

Some reference about not using .:

@apucher
Copy link
Contributor

apucher commented May 17, 2022

There's precedent for using dots as separators across databases/schemas in MySQL, MSSQL, and others. It's tricky in the sense that underscores are much more commonly used in name tables. Both, underscores and dots, could work. My hunch is that we'll ultimately have to support dots (to separate namespaces) as most cloud analytics systems already do:

@Jackie-Jiang
Copy link
Contributor

Agree on using dot to connect db/namespace and table name and column name (this is the standard SQL syntax).
The problem here is that we don't support db/namespace concept natively in Pinot right now, and we are trying to overload the table name to include the namespace as part of it. This can potentially become a blocker for us when adding support of namespace in the future.

@dongxiaoman
Copy link
Contributor Author

dongxiaoman commented May 18, 2022

@Jackie-Jiang Agree that using dot in table name is in general practice not so compatible. I already saw JDBC H2 error if I copy/paste the Full integration test because JDBC engine treat dots as separator by default. That is why I hide the support with a feature flag. cc @apucher . The H2 error can be bypassed by using backticks.

Meanwhile, as I mentioned, table name with dots can provide a transition/middle ground before we fully support full multi-tenant/DB concept.
By this PR, we can allow the user write something naturally:
SELECT col1 from mydb.test_table where xxx while the mydb.test_table is managed by Pinot admins, and the user is assigned mydb from the DataWarehouse infrastructure on top of Pinot.

If this is not breaking other features, providing extra flexibility (extra knob) will help support more use cases, it actually expands Pinot's scope

@Jackie-Jiang
Copy link
Contributor

Jackie-Jiang commented May 18, 2022

Sounds good. I think we can limit the flexibility of the table name to contain at most one dot, and assume the name is constructed from <db>.<table>. There are 2 places where we handle the dots within the table name:

  1. getActualTableName(): With the above limitation, no change is needed. The queried table name should have at most one dot
  2. getActualColumnName(): We can change the logic to do a prefix match for the rawTableName + '.' (use columnName.regionMatches() for better performance)

@dongxiaoman dongxiaoman force-pushed the xd-table-name-dots branch from f4f77f9 to c484a75 Compare May 19, 2022 18:20
String tableName = tableConfig.getTableName();
if (tableName.contains(".") || tableName.contains(" ")) {
throw new IllegalStateException("Table name: '" + tableName + "' containing '.' or space is not allowed");
if (tableName.contains(".") && !allowDots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check that the table name contains at most one dot, and put some comments explaining we allow table name with embedded database name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logic and also unit tests

*/
private String getActualTableName(String tableName) {
@VisibleForTesting
static String getActualTableName(String tableName, TableCache tableCache, BrokerRoutingManager routingManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not want to change this method. Even if we allow table name with database, we should probably still support the case of table name without database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the code change has only one effect:
If we allow dots in table name, we will avoid striping the [database_name]. from the table name. This will be needed in our case, that we wish to treat [database_name].[table_name] as the whole actual table name.

This behavior is under the feature flag so will be backwards compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the existing code should already be able to handle the [database_name].[table_name] as the table name (line 773, 801, 812 in the original code). We don't need to introduce the extra logic of reading properties from the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I did not expect the behavior to be the same but you are right.
I am still keeping the unit tests since they will ensure our expected behavior is not broken

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #8713 (3d1ec09) into master (b6ed914) will decrease coverage by 5.31%.
The diff coverage is 61.02%.

@@             Coverage Diff              @@
##             master    #8713      +/-   ##
============================================
- Coverage     68.17%   62.86%   -5.32%     
+ Complexity     4619     4605      -14     
============================================
  Files          1735     1689      -46     
  Lines         91320    89304    -2016     
  Branches      13644    13415     -229     
============================================
- Hits          62258    56141    -6117     
- Misses        24714    29132    +4418     
+ Partials       4348     4031     -317     
Flag Coverage Δ
integration2 ?
unittests1 66.23% <46.66%> (+0.01%) ⬆️
unittests2 14.14% <55.88%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pache/pinot/common/config/provider/TableCache.java 0.00% <0.00%> (-71.38%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 22.22% <ø> (ø)
...ntroller/helix/core/PinotHelixResourceManager.java 64.08% <56.70%> (-2.77%) ⬇️
...roker/requesthandler/BaseBrokerRequestHandler.java 33.02% <84.21%> (-37.41%) ⬇️
...oller/api/resources/PinotTableRestletResource.java 46.82% <100.00%> (-5.53%) ⬇️
...ler/api/resources/TableConfigsRestletResource.java 75.58% <100.00%> (+0.14%) ⬆️
...he/pinot/segment/local/utils/TableConfigUtils.java 67.65% <100.00%> (+0.72%) ⬆️
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
... and 358 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6ed914...3d1ec09. Read the comment docs.

@dongxiaoman dongxiaoman force-pushed the xd-table-name-dots branch from 6fb5f84 to 2b6898a Compare May 24, 2022 00:07
@dongxiaoman dongxiaoman force-pushed the xd-table-name-dots branch from 80edebc to 9e8d510 Compare May 24, 2022 18:12
@dongxiaoman dongxiaoman requested a review from Jackie-Jiang May 25, 2022 16:09
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other wise

@dongxiaoman dongxiaoman force-pushed the xd-table-name-dots branch 2 times, most recently from cbd751f to 94d6b8c Compare May 31, 2022 17:15
@dongxiaoman dongxiaoman force-pushed the xd-table-name-dots branch from 94d6b8c to 3d1ec09 Compare May 31, 2022 20:59
@dongxiaoman dongxiaoman requested a review from Jackie-Jiang May 31, 2022 23:22
@dongxiaoman
Copy link
Contributor Author

@Jackie-Jiang Sorry for the late response. Can you please have another quick look? All comments addressed

@Jackie-Jiang Jackie-Jiang added release-notes Referenced by PRs that need attention when compiling the next release notes feature labels Jun 1, 2022
@Jackie-Jiang Jackie-Jiang merged commit ae39243 into apache:master Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants