Skip to content

Add html_attr_relaxed escaping strategy#4743

Merged
fabpot merged 1 commit intotwigphp:3.xfrom
mpdude:escape-attr-relaxed
Feb 8, 2026
Merged

Add html_attr_relaxed escaping strategy#4743
fabpot merged 1 commit intotwigphp:3.xfrom
mpdude:escape-attr-relaxed

Conversation

@mpdude
Copy link
Copy Markdown
Contributor

@mpdude mpdude commented Jan 20, 2026

This adds html_attr_relaxed, a relaxed variant of the html_attr escaping strategy. The difference is that html_attr_relaxed does not escape the :, @, [ and ] characters. These are used by some front-end frameworks in attribute names to wire special handling/value binding. See https://v2.vuejs.org/v2/guide/syntax.html#v-bind-Shorthand for an example.

The HTML 5 spec does not exclude all those characters from attribute names (html.spec.whatwg.org/multipage/syntax.html#attributes-2).

However, at least XML processors will treat the colon as the XML namespace separator.

HTML 5 allows XML only on SVG and MathML elements, and only for pre-defined namespace-prefixes (developer.mozilla.org/en-US/docs/Web/API/Attr/localName#:~:text=That means that the local,different from the qualified name). For other something: prefixes, these will simply be passed on as part of the local attribute name.

According to engine.sygnal.com/research/html5-attribute-names, all current browser implementations handle at least the colon fine, and the aforementioned Vue.js documentation suggests that this is also the case for @.

Note also that Symfony UX only conditionally escapes attribute names, and it has : and @ in its safe list:
https://github.com/symfony/ux/blob/c9a3e66b8ac53e870097e8a828913e57204398e7/src/TwigComponent/src/ComponentAttributes.php#L82

Closes #3614.

@mpdude mpdude force-pushed the escape-attr-relaxed branch from 97bbe49 to e641a34 Compare January 20, 2026 10:30
@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Jan 30, 2026

@fabpot WDYT, would you endorse such an additional strategy?

@mpdude mpdude requested a review from stof February 1, 2026 15:02
if (\in_array($literal, $immune)) {
$this->assertEquals($literal, (new EscaperRuntime())->escape($literal, 'html_attr_relaxed'));
} else {
$this->assertNotEquals(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be on one line.


public function testHtmlAttributeRelaxedEscapingConvertsSpecialChars()
{
foreach ($this->htmlAttrSpecialChars as $key => $value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we test :, @, [, or ] as well here (they are not part of $this->htmlAttrSpecialChars)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short: I don't think we need to.

I don't fully understand why the tests are written and organized the way they are. It seems that has been inherited from old Zend Framework Escaper tests.

testHtmlAttributeRelaxedEscapingConvertsSpecialChars and testHtmlAttributeEscapingConvertsSpecialChars use $htmlAttrSpecialChars to test for a few samples that

  • alnums are not escaped
  • a few "immune" chars (common to both html_attr and html_attr_relaxed) are not escaped
  • two examples beyond ASCII 0xFF are escaped
  • other examples like <>&" are being escaped.

In addition to that, we have testHtmlAttributeEscapingEscapesOwaspRecommendedRanges and testHtmlAttributeRelaxedEscapingEscapesOwaspRecommendedRanges. Those tests will fully iterate the ASCII 0x01 to 0xFF range and test every single character from it. The test expects escaping to happen for all cases except the 0-9, a-z, A-Z ranges and explicitly given whitelists of characters.

So, I think we're safe as-is.

@mpdude mpdude force-pushed the escape-attr-relaxed branch from 675cd42 to d06e902 Compare February 8, 2026 15:49
@mpdude mpdude requested a review from fabpot February 8, 2026 16:11
@mpdude
Copy link
Copy Markdown
Contributor Author

mpdude commented Feb 8, 2026

Feedback addressed. Thank you!

When this gets merged, I'd like to use it as the escaping strategy for attribute names in #3930.

@fabpot fabpot force-pushed the escape-attr-relaxed branch from d06e902 to 04aa3df Compare February 8, 2026 17:59
@fabpot
Copy link
Copy Markdown
Contributor

fabpot commented Feb 8, 2026

Thank you @mpdude.

@fabpot fabpot merged commit afe8e19 into twigphp:3.x Feb 8, 2026
8 checks passed
fabpot added a commit that referenced this pull request Mar 17, 2026
…ributes easier (mpdude, polarbirke)

This PR was squashed before being merged into the 3.x branch.

Discussion
----------

Add an `html_attr` function to make outputting HTML attributes easier

**Updated:** This description has been updated to reflect changes from the discussion up to #3930 (comment).

This PR suggests adding an `html_attr` function and two filters `html_attr_merge` and `html_attr_type`. Together, they are intended to make it easier to collect HTML attributes in arrays, in order to pass them in Twig to included templates or macros, and to ultimately print such attribute sets as.

`html_attr_merge` can be used to either merge such arrays over default values, or to override (say, inside a macro) particular values in a given attribute array. As described in #3907, it favors overwriting simple (scalar) attribute values and appending to multi-valued attributes over all the other operations one could conceive (like, for example, replacing a list of two CSS `class` names with two other ones). This is a design decision to keep the API simple and optimized for the primary use case that I see. But, since we're mostly dealing with arrays after all, users are free to do in parallel any other kind of array wrangling they see fit.

So, this PR is _not_ trying to design a full-fledged, object-oriented API with all the necessary methods to add, replace, remove attributes; to add, change or toggle elements in "list" style attributes like `class`; to provide extension points for custom (arbitrary) attributes or to provide a fluent API to do all that from within PHP code. See symfony/ux#3269 for a Symfony UX component RFC that does that.

A little bit of special case handling is present for `aria-*`, `data-*` and inline CSS `style` attributes. But apart from that, there is no special knowledge about the attributes defined in HTML, ARIA or other standards, nor about the structure and sementic of attribute values. The approach works in a generic way, so it should be possible to use it for many custom attributes as well.

In order to support "list" style attributes like `class`, `aria-labelledby` or `srcset` that may come in different flavors, users have to be disciplined and consistently use iterables (arrays) to represent such attribute values, possibly assisted by the `html_attr_type` filter (see below).

When printing attributes, names and values are escaped. For names, the `html_attr_relaxed` strategy (#4743) is used.

#### Motivation and practical examples

I have seen repeating patterns when dealing with HTML attributes in Twig templates and macros. Typical examples can be found in Symfony's form theme, where an `attr` variable is present in various blocks.

https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L453-L458

```twig
{%- block widget_attributes -%}
    id="{{ id }}" name="{{ full_name }}"
    {%- if disabled %} disabled="disabled"{% endif -%}
    {%- if required %} required="required"{% endif -%}
    {{ block('attributes') }}
{%- endblock widget_attributes -%}
```

Could be along the lines of:

`{{ html_attr(attr, { id, name: full_name, disabled: disabled ? true : false, required : required ? true : false }) }}`.

If `disabled` and `required` were guaranteed to be booleans (I haven't checked), even better:

`{{ html_attr(attr, { id, name: full_name, disabled, required }) }}`

https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L347-L360

```twig
    {%- set attr = {} -%}
    {%- set aria_describedby = [] -%}
    {%- if help -%}
        {%- set aria_describedby = aria_describedby|merge([id ~ '_help']) -%}
    {%- endif -%}
    {%- if errors|length > 0 -%}
        {%- set aria_describedby = aria_describedby|merge(errors|map((_, index) => id ~ '_error' ~ (index + 1))) -%}
    {%- endif -%}
    {%- if aria_describedby|length > 0 -%}
        {%- set attr = attr|merge({'aria-describedby': aria_describedby|join(' ')}) -%}
    {%- endif -%}
    {%- if errors|length > 0 -%}
        {%- set attr = attr|merge({'aria-invalid': 'true'}) -%}
    {%- endif -%}
```

Could be:

```twig
  {%- set attr = {}|html_attr_merge(
    help ? { 'aria-describedby': [id ~ '_help'] },
    errors|length ? { 'aria-invalid': true, 'aria-describedby': errors|map((_, index) => id ~ '_error' ~ (index + 1)) }
  ) -%}
```

https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L470-L481

```twig
{% block attributes -%}
    {%- for attrname, attrvalue in attr -%}
        {{- ' ' -}}
        {%- if attrname in ['placeholder', 'title'] -%}
            {{- attrname }}="{{ translation_domain is same as(false) or attrvalue is null ? attrvalue : attrvalue|trans(attr_translation_parameters, translation_domain) }}"
        {%- elseif attrvalue is same as(true) -%}
            {{- attrname }}="{{ attrname }}"
        {%- elseif attrvalue is not same as(false) -%}
            {{- attrname }}="{{ attrvalue }}"
        {%- endif -%}
    {%- endfor -%}
{%- endblock attributes -%}
```

This should basically be the same as `{{ html_attr(attr) }}`, ignoring edge cases for `null` values. Handling of the `translation_domain` might require a preceding `html_attr_merge` step to replace values with translations.

Finally,

```twig
    {% set id = 'id value' %}
    {% set href = 'href value' %}
    {% set disabled = true %}

    <div {{ html_attr(
        { id, href },
        disabled ? { 'aria-disabled': 'true' },
        not disabled ? { 'aria-enabled' : true },
        { class: ['zero', 'first'] },
        { class: ['second'] },
        true ? { class: 'third' },
        { style: { color: 'red' } },
        { style: { 'background-color': 'green' } },
        { style: { color: 'blue' } },
        { 'data-test': 'some value' },
        { 'data-test': 'other value', 'data-bar': 'baz' }},
        { 'dangerous=yes foo' : 'xss' },
        { style: ['text-decoration: underline'] },
    ) }}></div>
```

will generate HTML markup:

```
<div id="id value" href="href value" aria-disabled="true" class="zero first second third" style="color: red; background-color: green; color: blue; text-decoration: underline;" data-test="other value" data-bar="baz" dangerous&#x3D;yes&#x20;foo="xss"></div>
```

#### Details on `html_attr_merge`

This filter merges an `attr` style array with one or several other arrays given as arguments. All of those arrays should reasonably be mappings, i. e. use keys that denote attribute names, and not sequences or lists with numeric keys.

Empty arrays, empty strings or false values in the argument list will be ignored, which can be used to conditionally include values in the merge list like so:

```twig
{% set attr = attr|html_attr_merge(
  condition ? { attrname: "attrvalue", other: "value" }
) %}
```

The merging of attribute values is similar to PHP's `array_merge` function. Latter (right) values generally override former (left) values, as follows:

When two values are to be merged and both are either scalars or objects, the latter (right) value overrides the previous (left) one.

When both values are iterables, `array_merge`/spread operator behavior is used: Numeric indices will be appended, whereas non-numeric ones will be replaced. This can be used to override designated elements in sets like CSS classes:

```twig
{% set attr = { class: ['foo', 'bar'] }|html_attr_merge(
  { class: { importance: 'normal' } },
  critical ? { class: { importance: 'high' } }
) %}
```

To provide more flexibility with regards to different merging strategies, the `MergeInterface` is provided as a flex point for advanced use cases of power users.

* When an attribute value that represents a "right hand side" value has to be merged and implements `MergeInterface`, its `mergeInto()` method will be passed the previous (left hand side) value. That method will return the merge result.

* Otherwise, when the "left hand side" value implements `MergeInterface`, `appendFrom()` will be passed the new (right hand side) value. Again, that method returns the merge result.

Other combinations of values are rejected, an exception is thrown. This is a design decision to clearly and early notify users of combinations that might have unclear or unpredictable results, like merging a string like `'foo'` with an array like `['bar', 'baz']` for a `class` attribute – should this override, since one value is a scalar, or append, since the other one is an array?

#### Details on `html_attr`

The `html_attr()` function prints attribute-arrays as HTML. It will perform appropriate escaping of attribute names and values.

One or several attribute arrays can be passed to `html_attr`, and `html_attr_merge` will be used first to merge those.

In general, scalar attribute values (including the empty string `''`) will be printed as-is. For booleans and `null` values, the following extra rules apply:

* For `aria-*`, boolean `true` and `false` will be coalesced to `"true"` and `"false"`, respectively.
* For `data-*`, boolean `true` will be coalesced to `"true"`, and non-scalar values will be JSON-encoded
* Otherwise, `false` and `null` attribute values will always omit printing of the attribute.
* `true` values will print the attribute as `attributeName=""`. This is equivalent to printing `<... attributeName>`, but is X(HT)ML compliant. The user-agent will fill in the attribute's _empty default value_.

These rules derive from the comparison provided at symfony/ux#3269 (comment) that shows how React and Vue as front-end frameworks behave in the same situation.

For values that implement `AttributeValueInterface`, its `getValue(): ?string` method will be called first. The attribute will be omitted for a `null` return value, otherwise printed with the returned string.

This interface could be used to provide (outside the scope of this PR or even outside Twig itself) classes that could e. g. help building more complex attribute values, like for image `srcset`. But the primary reason for adding it was to be able to deal with attributes values that are lists or hashes and need to be printed in different ways:

* attributes like `class` or `aria-labelledby` use [space-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#space-separated-tokens) as their value
* `srcset` or `sizes` for `<img>` use [comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#comma-separated-tokens)
* other standards/extensions may have even other concepts

`html_attr` will generally print array values as a space-separated list of values.

A special case is the attribute name `style` if its value is an array. It will be converted to inline CSS. Values with numeric keys will be printed followed by a `;`. Non-numeric keys will print a pattern of `key: value;`.

#### Details on `html_attr_type`

The `html_attr_type` filter can be used to convert an array passed into it into implementations of the mentioned interfaces in a few predefined ways. It takes a single argument indicating the type to use – similar to the `escape` filter in Twig that knows about `html`, `js`, `css` and a few more.

* `sst` means `space separated tokens`
* `cst` means `comma separated tokens`
* `style` means "inline CSS", for completeness

So, the following will construct an `attr` array for an `img` tag, where `sizes` needs to be printed separated by commas.

```twig
{% set attr = {
  srcset: ['small.jpg 480w']|html_attr_type('cst'),
  alt: 'A cute kitten'
} %}
{# amend the srcset #}
{% set attr = attr|html_attr_merge({ srcset: ['medium.jpg 800w', 'large.jpg 1200w'] }) %}
<img {{ html_attr(attr) />
```

This works because the `SeparatedTokenList` that is used for `sst` and `cst` implements merge behavior where a given value can be extended by merging arrays.

#### Design considerations

Exposing behavior for different attribute types through these interfaces may not be the 100% perfect, nice, automagic solution. _But_ it has the big advantage that we are not committing ourselves to a particular list of attributes for which standards-specific knowledge would have to be put in the code. I would consider that a maintenance nightmare, since it would require us to decide which attribute/special case to support and which not. Every single change in that list would be BC-breaking.

The two interfaces put control over that in the hands of power users or extension authors. Arbitrary ways could be conceived to create instances of these interfaces.

The `html_attr_type` filter provides built-in access to the two types I see relevant in the HTML 5 standard, the space and comma separated token lists.

The built-in default conversion of arrays to space-separated token lists should further reduce visibility of that problem for average template authors, who can hopefully ignore the problem most of the time.

Closes #3907, which outlined the initial idea.

#### TODO:

- [x] Get initial feedback
- [x] Add tests
- [x] Concept for attributes that employ [comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#comma-separated-tokens)
- [x] Add documentation
- [x] Add docblocks and type hints
- [x] Make a decision: [Escape attribute names or not?](#3930 (comment))

Co-authored-by: `@polarbirke`

Commits
-------

42c12fa Add an `html_attr` function to make outputting HTML attributes easier
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants