Skip to content

Conversation

@zhangwenchao-123
Copy link
Contributor

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

@zhangwenchao-123 zhangwenchao-123 marked this pull request as draft March 4, 2024 08:43
@zhangwenchao-123 zhangwenchao-123 marked this pull request as ready for review March 4, 2024 09:06
@zhangwenchao-123 zhangwenchao-123 marked this pull request as draft March 5, 2024 03:19
@zhangwenchao-123 zhangwenchao-123 marked this pull request as ready for review March 19, 2024 07:53
@zhangwenchao-123 zhangwenchao-123 marked this pull request as draft March 25, 2024 06:54
@zhangwenchao-123 zhangwenchao-123 marked this pull request as ready for review March 26, 2024 06:14
@zhangwenchao-123 zhangwenchao-123 marked this pull request as draft April 1, 2024 02:24
@zhangwenchao-123 zhangwenchao-123 marked this pull request as ready for review April 2, 2024 01:52
@my-ship-it
Copy link
Contributor

commit message?

@my-ship-it
Copy link
Contributor

postgres=# create directory table tbl;
ERROR: Tablespace is disallowed to use NULL in create directory table.

@my-ship-it
Copy link
Contributor

postgres=# create directory table tbl; ERROR: Tablespace is disallowed to use NULL in create directory table.

We need to support default tablespace here?

@my-ship-it
Copy link
Contributor

postgres=# create directory table tbl tablespace pg_default;
ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.

@my-ship-it
Copy link
Contributor

postgres=# create directory table tbl tablespace pg_default;
ERROR: unable to create directory "pg_tblspc/1663/GPDB_1_302402231/13289/16384_dirtable" (seg2 127.0.1.1:7004 pid=630176)
postgres=# create tablespace my_tbspc location '/tmp/tblspc';
ERROR: directory "/tmp/tblspc/1/GPDB_1_302402231" already in use as a tablespace
postgres=# create tablespace my_tbspc location '/tmp/tblspc';
CREATE TABLESPACE
postgres=# create directory table tbl tablespace my_tbspc;
CREATE DIRECTORY_TABLE

Implement directory table feature in this commit. Directory table is a new
relation which used to organize the unstructured data files in the specified
tablespace. The date files are stored in the specified tablespace while
the tuples recorded the metadata of the data files such as relative_path, md5
size etc. are stored in normal table.

We support local directory table and remote directory table meanwhile. The
local directory table uses the local tablespace while the remote directory
table uses the DFS tablespace which implemented in our enterprise extension.

We support copy binary from to upload file to directory table, directory_table
UDF to get file content, remove_file UDF to remove file from directory table.
What's more, we implement a tool called cbload used to upload file to direcotry
table. Meanwhile, to support DFS directory table, we also import some catalog
tables such as gp_storage_server, gp_storage_user_mapping which are shared in
all databases.

We will illustrage some examples for your convinence of usage as follow.

-- Create an oss_server that points to endpoint:
CREATE STORAGE SERVER oss_server OPTIONS
(protocol 'qingstor', endpoint 'pek3b.qingstor.com', https 'true', virtual_host 'false');

-- Create a user mapping to access oss_server
CREATE STORAGE USER MAPPING FOR CURRENT_USER STORAGE SERVER oss_server OPTIONS
(accesskey 'KGCPPHVCHRDSYFEAWLLC', secretkey '0SJIWiIATh6jOlmAas23q6hOAGBI1BnsnvgJmTs');

-- Create a local tablespace
CREATE TABLESPACE dirtable_spc location '/data/dirtable_spc';

-- Create a local directory table
CREATE DIRECTORY TABLE dirtable TABLESPACE dirtable_spc;

-- Copy binary from directory table
COPY BINARY dirtable FROM '/data/file1.csv' 'file1';

-- Select directory table
SELECT * FROM dirtable;
SELECT * FROM directory_table('dirtable');

-- Remove file from directory table
SELECT remove_file('dirtable', 'file1');

Co-authored-by: Mu Guoqing [email protected]
Reviewd-by: Yang Yu [email protected]
            Yang Jianghua [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants