Skip to content

Build out rule and fix serialisation#5364

Merged
alanmcruickshank merged 27 commits intomainfrom
ac/fix_serialisation
Dec 5, 2023
Merged

Build out rule and fix serialisation#5364
alanmcruickshank merged 27 commits intomainfrom
ac/fix_serialisation

Conversation

@alanmcruickshank
Copy link
Copy Markdown
Member

@alanmcruickshank alanmcruickshank commented Oct 30, 2023

This resolves #5353 and resolves #4651. It also resolves #4238.

This PR significantly increases the amount of detail returned by the serialisation routines for violations and fixes. This means a lot more info is included in the non human output versions of sqlfluff lint (including both start and end points for errors, and the initial fixes proposed). As a result, tests cases which specify the details of a violation are now also required to include that detail - so most of the increased lines of code from this PR are from increased detail in test cases (which I think leads to better test coverage in essence).

Specfically it renames get_info_dict() to just to_dict() and adds a parallel to_dict() method on the LintFix class so that proposed fixes are shown in the output when linting issues are serialised. Warnings also come through now too - always with the check type notice, so that the --annotation-type cli option only applies to violations. Given that warnings come through at that level, I've changed the default level to warning.

Based on feedback from @golergka and @james-johnston-thumbtack, I've kept the format fairly flat rather than nesting starts and ends into some kind of combined structure.

@golergka
Copy link
Copy Markdown
Contributor

golergka commented Oct 30, 2023

Thank you so much for your work on this! I've only read the changes, haven't tried running this code yet.

One thing that seems a little bit confusing for me is a different formatting for positions. For lints, we get line_no and line_pos properties (which is great for my particular use case, because they translate nicely into Monaco's line and column numbers), but for fixes it's just source_position with a tuple of two ints.

From your code for delete edit type, it seems that this source position is supposed to be all the information about the edit — so those two numbers must encode both start and end of the segment, which means that the int in this tuple is just a Nth character in the whole input.

Would it be possible to get start_line_no, start_line_pos, end_line_no and end_line_pos for the fixes instead? IMO, it might be a better API design, because it would use the same semantics which is already used for lints themselves.

@james-johnston-thumbtack
Copy link
Copy Markdown
Contributor

I assume you're referring to the source_position in simple_test.py? And these are (byte? character?) offsets into the input string? In which case yeah it makes sense.

Here is where the plugin of @dalgarins comes up with a range of text for IntelliJ to highlight.... in this case it appears that IntelliJ needs a starting character position and a length: https://github.com/anboralabs/sqlfluff-intellij-community/blob/08354f69b81f3e5ce83e5cdf6245d10a47bc210b/src/main/kotlin/co/anbora/labs/sqlfluff/lint/Linter.kt#L111 --- the existing code is assuming length "0" (since it doesn't know it), and a starting position calculated by using the getLineStartOffset function (https://github.com/anboralabs/sqlfluff-intellij-community/blob/08354f69b81f3e5ce83e5cdf6245d10a47bc210b/src/main/kotlin/co/anbora/labs/sqlfluff/lint/Linter.kt#L105 ) since SQLFluff only provides the line number.

So it appears that maybe it's six one way and half a dozen the other in terms of whether it's most convenient to provide start/end line numbers & positions, as @golergka proposes, or overall offset values as your current commit does. Certainly, if we were writing a console app to write these offsets, it's easy to directly print the offsets as-is if the API is providing line numbers like @golergka says. At any rate, it appears that the IntelliJ SQLFluff plugin has a way to convert between the two as needed.

@dalgarins
Copy link
Copy Markdown

I agree with @golergka to show this values start_line_no, start_line_pos, end_line_no and end_line_pos, will be helpful for other tools not only my plugin.

@alanmcruickshank
Copy link
Copy Markdown
Member Author

I agree with @golergka to show this values start_line_no, start_line_pos, end_line_no and end_line_pos, will be helpful for other tools not only my plugin.

Perhaps a trivial question, but one I'd love to be fairly consistent on: Would you expect these to be four "flat" values, or would you expect perhaps a more "nested" structure? We could do something like:

Option 1: Flat

{
    "rule": "ABC123",
    "other": "stuff...",
    "start_line_no": 1,
    "start_line_pos": 4,
    "end_line_no": 3,
    "end_line_pos": 7
}

Option 2: Nested by start/end

{
    "rule": "ABC123",
    "other": "stuff...",
    "start": {
        "line_no": 1,
        "line_pos": 4,
        "file_pos": 14
    },
    "end": {
        "line_no": 3,
        "line_pos": 7,
        "file_pos": 23
    }
}

I'm erring toward the latter because we can include multiple options for how people want to locate things - but I realise it's both a more complicated structure, and also a bigger change from past behaviour. Doubly so in the fixes because it makes the structure much denser.

@dalgarins
Copy link
Copy Markdown

@alanmcruickshank with flat version is ok.

@golergka
Copy link
Copy Markdown
Contributor

golergka commented Nov 2, 2023

Second flat version just because it's how dicts for the errors already work

@coveralls
Copy link
Copy Markdown

coveralls commented Dec 1, 2023

Coverage Status

coverage: 100.0%. remained the same
when pulling 7e2d72b on ac/fix_serialisation
into 44143cf on main.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 1, 2023

Coverage Results ✅

Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL   16926      0   100%

222 files skipped due to complete coverage.

@alanmcruickshank alanmcruickshank marked this pull request as ready for review December 1, 2023 22:37
Copy link
Copy Markdown
Contributor

@greg-finley greg-finley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@alanmcruickshank
Copy link
Copy Markdown
Member Author

@golergka @james-johnston-thumbtack @dalgarins do you have any additional feedback on this version before I get it merged? When it's in, I'll push out another pre-release for people to beta-test.

Copy link
Copy Markdown
Contributor

@golergka golergka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your work on this. Reading this code, I just got a couple of small comments, but most importantly, the API that you expose for fixes and end positions is great.

Comment thread src/sqlfluff/cli/commands.py Outdated
Comment thread src/sqlfluff/core/errors.py Outdated
@dalgarins
Copy link
Copy Markdown

Thank you, for your work, this PR will allow a better integration with other tools, I don't have any other comment.

@alanmcruickshank alanmcruickshank added this pull request to the merge queue Dec 5, 2023
Merged via the queue into main with commit 100c628 Dec 5, 2023
@alanmcruickshank alanmcruickshank deleted the ac/fix_serialisation branch December 5, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants