Skip to content

Conversation

@YOMO-Lee
Copy link
Contributor

Added file filtering instructions to the localfile connector documentation

YOMO-Lee and others added 15 commits October 22, 2024 17:28
Supplement and optimize the description of the LocalFile connector on filtering files
[(#7887)](#7887)
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner, Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
Continue to optimize the document about filtering files and add some examples
[(#7887)](#7887)
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YOMO-Lee ! I left some comments.


Filter pattern, which used for filtering files.

The filtering format is similar to wildcard matching file names in Linux.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot tell users about the ambiguous conclusion. Please tell users directly that we use Java regular expressions.

Comment on lines 259 to 270
| Wildcard | Meaning | Example |
|--------------|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| * | Match 0 or more characters | f* &emsp;&ensp;&emsp; Any file starting with f<br/>b*.txt &emsp; Any file starting with b, any character in the middle, and ending with. txt |
| [] | Match a single character in parentheses | [abc]* &emsp; A file that starts with any one of the characters a, b, or c |
| ? | Match any single character | f?.txt &emsp; Any file starting with 'f' followed by a character and ending with '. txt' |
| [!] | Match any single character not in parentheses | [!abc]* &emsp; Any file that does not start with abc |
| [a-z] | Match any single character from a to z | [a-z]* &emsp; Any file starting with a to z |
| {a,b,c}/a..z | When separated by commas, it represents individual characters<br/>When separated by two dots, represents continuous characters | {a,b,c}* &emsp; Files starting with any character from abc<br/>{a..Z}* &emsp;&ensp; Files starting with any character from a to z |

However, it should be noted that unlike Linux wildcard characters, when encountering file suffixes, the middle dot cannot be omitted.

For example, `abc20241022.csv`, the normal Linux wildcard `abc*` is sufficient, but here we need to use `abc*.*` , Pay attention to a point in the middle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 274 to 287
report.txt
notes.txt
input.csv
abch20241022.csv
abcw20241022.csv
abcx20241022.csv
abcq20241022.csv
abcg20241022.csv
abcv20241022.csv
abcb20241022.csv
old_data.csv
logo.png
script.sh
helpers.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some file path, not only match file name.

Optimize the describe about Regex

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it
The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression.


The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it

File Structure Example:
Copy link
Member

@Hisoka-X Hisoka-X Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
File Structure Example:
There are some examples.
File Structure Example:

Optimize document structure
Optimize document structure
@YOMO-Lee YOMO-Lee requested a review from Hisoka-X October 29, 2024 04:04
@YOMO-Lee
Copy link
Contributor Author

@Hisoka-X Please review this

Please provide a description of all connectors that support the file_filter_pattern parameter
Added the following file connector description about file_filter_pattern:
CosFile(en)、OssFile(en)、OssJindoFile(en)、HdfsFile(en)
Added the following file connector description about file_filter_pattern:
FtpFile(en)、SftpFile(en)、S3File(en)、HdfsFile(zh)
@Hisoka-X Hisoka-X changed the title [Fix] LocalFile doc optimize (#7887) [Improve][Doc] Add file_filter_pattern example to doc Oct 29, 2024
@YOMO-Lee
Copy link
Contributor Author

@zhilinli123 please review

@hailin0 hailin0 merged commit a2590e8 into apache:dev Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants