Skip to content

Bug in attribute tokenization when content contains PHP end token or attribute closer on new line #3294

@jrfnl

Description

@jrfnl

Describe the bug
Tokenization of attributes is not always consistent across PHP versions.

While the given code sample may be an edge case, it is 100% valid PHP 8.0 code and is in actual fact one of the examples given in one of the related RFCs.
(last example in this section: https://wiki.php.net/rfc/shorter_attribute_syntax_change#discussion_of_forwards_compatibility_procons )

Code sample

<?php
#[DeprecationReason('reason: <https://some-website/reason?>')]
function main() {}

Tokenizes on PHP 8.0 with PHPCS 3.6.0 as:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [2]: #[
  2 | L2 | C  3 | CC 0 | ( 0) | T_STRING                   | [17]: DeprecationReason
  3 | L2 | C 20 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [1]: (
  4 | L2 | C 21 | CC 0 | ( 1) | T_CONSTANT_ENCAPSED_STRING | [40]: 'reason: <https://some-website/reason?>'
  5 | L2 | C 61 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [1]: )
  6 | L2 | C 62 | CC 0 | ( 0) | T_ATTRIBUTE_END            | [1]: ]
  7 | L2 | C 63 | CC 0 | ( 0) | T_WHITESPACE               | [0]:

  8 | L3 | C  1 | CC 0 | ( 0) | T_FUNCTION                 | [8]: function
  9 | L3 | C  9 | CC 0 | ( 0) | T_WHITESPACE               | [1]:
 10 | L3 | C 10 | CC 0 | ( 0) | T_STRING                   | [4]: main
 11 | L3 | C 14 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [1]: (
 12 | L3 | C 15 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [1]: )
 13 | L3 | C 16 | CC 0 | ( 0) | T_WHITESPACE               | [1]:
 14 | L3 | C 17 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET       | [1]: {
 15 | L3 | C 18 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET      | [1]: }
 16 | L3 | C 19 | CC 0 | ( 0) | T_WHITESPACE               | [0]:

... while on PHP <8.0, like PHP 7.4 or PHP 5.6, with PHPCS 3.6.0, it tokenizes as:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [2]: #[
  2 | L2 | C  3 | CC 0 | ( 0) | T_CLOSE_TAG                | [2]: ?>
  3 | L2 | C  5 | CC 0 | ( 0) | T_INLINE_HTML              | [3]: ')]

  4 | L3 | C  1 | CC 0 | ( 0) | T_INLINE_HTML              | [18]: function main() {}

Expected behavior
Consistent tokenization of code across PHP version.

Versions (please complete the following information):

  • OS: Windows 10
  • PHP: < 8.0 (tested on PHP 7.4.16)
  • PHPCS: 3.6.0/ master
  • Standard: N/A

Additional context
Related to PR #3203 /cc @alekitto (sorry, I didn't get round to doing edge case testing before)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions