Skip to content

feat: Introduce a DialectProviderInterface matching the modern cucumber API#350

Merged
stof merged 6 commits intoBehat:masterfrom
stof:dialect_api
May 26, 2025
Merged

feat: Introduce a DialectProviderInterface matching the modern cucumber API#350
stof merged 6 commits intoBehat:masterfrom
stof:dialect_api

Conversation

@stof
Copy link
Copy Markdown
Member

@stof stof commented May 23, 2025

When configuring the Lexer with a dialect provider, invalid language tags in the parsed files will fail instead of silently uswing English keywords, matching the cucumber behavior. Refs #156
When configuring the Lexer with a KeywordsInterface, the existing silent usage of English will still be done (as the Keywords implementation does that internally).
Failing for invalid languages during parsing will be implemented in a follow-up PR based on the compatiblity mode.

An implementation is provided based on the cucumber gherkin-languages.json file, which is now shipped in the package
unmodified (but is considered an internal implementation detail of the new class as we don't expose the path).
No configurable implementation is provided in the core. The DialectProviderInterface is simple enough to implement it fully based on your needs (I expect that the need to read translations from a file having the same structure than the cucumber gherkin-languages.json is not the use case for custom dialect providers, and even in such case, it is dead simple to implement it).

Closes #203

This PR does not provide a replacement for the KeywordsDumper feature for now (which is more a dumper for an example feature). This will be handled in a separate PR.
This PR also does not mark the KeywordsInterface as deprecated yet, but this is expected to happen once the replacement for the KeywordsDumper is available.

@stof stof requested a review from acoulton May 23, 2025 11:51
@codecov
Copy link
Copy Markdown

codecov bot commented May 23, 2025

Codecov Report

Attention: Patch coverage is 97.58454% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.10%. Comparing base (fad95fa) to head (ba7f752).
Report is 10 commits behind head on master.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/Dialect/GherkinDialect.php 94.73% 2 Missing ⚠️
src/Keywords/DialectKeywords.php 94.59% 2 Missing ⚠️
src/Lexer.php 98.70% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #350      +/-   ##
============================================
- Coverage     96.82%   96.10%   -0.72%     
- Complexity      581      631      +50     
============================================
  Files            37       42       +5     
  Lines          1730     1875     +145     
============================================
+ Hits           1675     1802     +127     
- Misses           55       73      +18     
Flag Coverage Δ
php8.1 96.10% <97.58%> (-0.72%) ⬇️
php8.1--with=symfony/yaml:^5.4 96.10% <97.58%> (-0.72%) ⬇️
php8.1--with=symfony/yaml:^6.4 96.10% <97.58%> (-0.72%) ⬇️
php8.2 96.10% <97.58%> (-0.72%) ⬇️
php8.3 96.10% <97.58%> (-0.72%) ⬇️
php8.4 96.10% <97.58%> (-0.72%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- run: composer update
if: steps.cucumber.outputs.cucumber_updated == 'yes'

- run: cp vendor/cucumber/gherkin-monorepo/gherkin-languages.json resources/
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally kept this copying separate from the bin/update_i18n script to allow us to drop the bin/update_i18n script when we drop the i18n.php file (when removing the Keywords API in the next major)

@stof
Copy link
Copy Markdown
Member Author

stof commented May 23, 2025

Due to the Lexer still being extensible, I kept 2 unused protected methods (marking them deprecated) in case a child class uses them. As the Lexer never uses them anymore, I've excluded them from code coverage.

@stof stof force-pushed the dialect_api branch 2 times, most recently from 2d98f28 to 265b9e8 Compare May 23, 2025 19:44
@stof
Copy link
Copy Markdown
Member Author

stof commented May 23, 2025

the code coverage reduction is because ArrayKeywords::getStepKeywords becomes uncovered as it was only ever called by the Lexer (which now relies on the Dialect API instead).

Copy link
Copy Markdown
Member Author

@stof stof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParserTest and ParserExceptionsTest are still using the Keywords API as they were using locally-defined translations, not the i18n.php containing the official gherkin translations. As we are in the process of deleting those tests to replace them with compatibility tests contributed to cucumber, I decided it was not worth validating that the translations were still in sync.


$this->allowFeature = false;
$this->allowLanguageTag = false;
$this->allowMultilineArguments = false;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the old scanInputForKeywords which was containing a bunch of conditions on the type to decide which allow flag should be toggles, I moved this to each of the specific methods. I personally find this clearer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer, thanks

$trimmedLine = $this->getTrimmedLine();
$matchedKeyword = null;

foreach ($this->currentDialect->getStepKeywords() as $keyword) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential performance hit here, compared to the previous regex approach? There are quite a lot of step keyword variants to check for every line of every feature.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exactly the way cucumber/gherkin implements it. But I haven't done a benchmark yet.

@acoulton
Copy link
Copy Markdown
Contributor

@stof this is broadly looking great so far, I ran out of time to get to the end of it just now but have left some initial small comments / thoughts.

@acoulton
Copy link
Copy Markdown
Contributor

Looked over the rest of this now @stof, this is looking really good thanks.

try {
$parsed = $parser->parse($source, __DIR__ . DIRECTORY_SEPARATOR . $language . '_' . ($num + 1) . '.feature');
} catch (\Throwable $e) {
throw new \RuntimeException($e->getMessage() . ":\n" . $source, 0, $e);

This comment was marked as resolved.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHPUnit truncates the test parameters, which makes it useless to read the source (which is far longer than the truncation limit).
Anyway, that's what we already do in our KeywordsTestCase (from which I copied a big part of the code).

@stof stof force-pushed the dialect_api branch 3 times, most recently from 557d447 to 5f61d8d Compare May 26, 2025 11:08
@stof stof requested a review from acoulton May 26, 2025 11:31
@stof stof changed the title Introduce a DialectProviderInterface matching the modern cucumber API feat: Introduce a DialectProviderInterface matching the modern cucumber API May 26, 2025
Copy link
Copy Markdown
Contributor

@acoulton acoulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stof, this is looking great. My only outstanding question is re a legacy class calling the now-deprecated getKeywords('Step') - I am possibly misunderstanding, but I think the regex that method returns has changed in a way that would change behaviour of the extension class (which will not necessarily be calling ltrim on the part after the matched keyword)?

@stof
Copy link
Copy Markdown
Member Author

stof commented May 26, 2025

Good catch indeed. I'll fix that.

@stof stof requested a review from acoulton May 26, 2025 15:24
stof added 6 commits May 26, 2025 17:27
ArrayKeywords makes its language default to `en` and assumes that there
is data available for it without checking. When this is not the case,
calling its methods before setting another language will trigger a PHP
warning and return `null` (in methods where the phpdoc says it must
return a string).
The ArrayKeywords test is currently passing only because the current way
the parser works avoids this broken state for the particular case being
used in the test, but this is not guaranteed to stay true.
An implementation is provided based on the cucumber
gherkin-languages.json file, which is now shipped in the package
unmodified (but is considered an internal implementation detail of the
new class as we don't expose the path).
This test is similar to the CachedArrayKeywords test but relies on the
new Dialect API and covers translations defined in the
gherkin-languages.json file.
Code coverage does not count code executed in data providers as being
covered.
Copy link
Copy Markdown
Contributor

@acoulton acoulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @stof

@stof stof merged commit be8ac49 into Behat:master May 26, 2025
10 of 11 checks passed
@stof stof deleted the dialect_api branch May 26, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The Keywords API is based on cucumber gherkin 2

3 participants