Add html_attr_relaxed escaping strategy#4743
Conversation
97bbe49 to
e641a34
Compare
|
@fabpot WDYT, would you endorse such an additional strategy? |
tests/Runtime/EscaperRuntimeTest.php
Outdated
| if (\in_array($literal, $immune)) { | ||
| $this->assertEquals($literal, (new EscaperRuntime())->escape($literal, 'html_attr_relaxed')); | ||
| } else { | ||
| $this->assertNotEquals( |
|
|
||
| public function testHtmlAttributeRelaxedEscapingConvertsSpecialChars() | ||
| { | ||
| foreach ($this->htmlAttrSpecialChars as $key => $value) { |
There was a problem hiding this comment.
Shouldn't we test :, @, [, or ] as well here (they are not part of $this->htmlAttrSpecialChars)?
There was a problem hiding this comment.
Short: I don't think we need to.
I don't fully understand why the tests are written and organized the way they are. It seems that has been inherited from old Zend Framework Escaper tests.
testHtmlAttributeRelaxedEscapingConvertsSpecialChars and testHtmlAttributeEscapingConvertsSpecialChars use $htmlAttrSpecialChars to test for a few samples that
- alnums are not escaped
- a few "immune" chars (common to both
html_attrandhtml_attr_relaxed) are not escaped - two examples beyond ASCII 0xFF are escaped
- other examples like
<>&"are being escaped.
In addition to that, we have testHtmlAttributeEscapingEscapesOwaspRecommendedRanges and testHtmlAttributeRelaxedEscapingEscapesOwaspRecommendedRanges. Those tests will fully iterate the ASCII 0x01 to 0xFF range and test every single character from it. The test expects escaping to happen for all cases except the 0-9, a-z, A-Z ranges and explicitly given whitelists of characters.
So, I think we're safe as-is.
675cd42 to
d06e902
Compare
|
Feedback addressed. Thank you! When this gets merged, I'd like to use it as the escaping strategy for attribute names in #3930. |
d06e902 to
04aa3df
Compare
|
Thank you @mpdude. |
…ributes easier (mpdude, polarbirke) This PR was squashed before being merged into the 3.x branch. Discussion ---------- Add an `html_attr` function to make outputting HTML attributes easier **Updated:** This description has been updated to reflect changes from the discussion up to #3930 (comment). This PR suggests adding an `html_attr` function and two filters `html_attr_merge` and `html_attr_type`. Together, they are intended to make it easier to collect HTML attributes in arrays, in order to pass them in Twig to included templates or macros, and to ultimately print such attribute sets as. `html_attr_merge` can be used to either merge such arrays over default values, or to override (say, inside a macro) particular values in a given attribute array. As described in #3907, it favors overwriting simple (scalar) attribute values and appending to multi-valued attributes over all the other operations one could conceive (like, for example, replacing a list of two CSS `class` names with two other ones). This is a design decision to keep the API simple and optimized for the primary use case that I see. But, since we're mostly dealing with arrays after all, users are free to do in parallel any other kind of array wrangling they see fit. So, this PR is _not_ trying to design a full-fledged, object-oriented API with all the necessary methods to add, replace, remove attributes; to add, change or toggle elements in "list" style attributes like `class`; to provide extension points for custom (arbitrary) attributes or to provide a fluent API to do all that from within PHP code. See symfony/ux#3269 for a Symfony UX component RFC that does that. A little bit of special case handling is present for `aria-*`, `data-*` and inline CSS `style` attributes. But apart from that, there is no special knowledge about the attributes defined in HTML, ARIA or other standards, nor about the structure and sementic of attribute values. The approach works in a generic way, so it should be possible to use it for many custom attributes as well. In order to support "list" style attributes like `class`, `aria-labelledby` or `srcset` that may come in different flavors, users have to be disciplined and consistently use iterables (arrays) to represent such attribute values, possibly assisted by the `html_attr_type` filter (see below). When printing attributes, names and values are escaped. For names, the `html_attr_relaxed` strategy (#4743) is used. #### Motivation and practical examples I have seen repeating patterns when dealing with HTML attributes in Twig templates and macros. Typical examples can be found in Symfony's form theme, where an `attr` variable is present in various blocks. https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L453-L458 ```twig {%- block widget_attributes -%} id="{{ id }}" name="{{ full_name }}" {%- if disabled %} disabled="disabled"{% endif -%} {%- if required %} required="required"{% endif -%} {{ block('attributes') }} {%- endblock widget_attributes -%} ``` Could be along the lines of: `{{ html_attr(attr, { id, name: full_name, disabled: disabled ? true : false, required : required ? true : false }) }}`. If `disabled` and `required` were guaranteed to be booleans (I haven't checked), even better: `{{ html_attr(attr, { id, name: full_name, disabled, required }) }}` https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L347-L360 ```twig {%- set attr = {} -%} {%- set aria_describedby = [] -%} {%- if help -%} {%- set aria_describedby = aria_describedby|merge([id ~ '_help']) -%} {%- endif -%} {%- if errors|length > 0 -%} {%- set aria_describedby = aria_describedby|merge(errors|map((_, index) => id ~ '_error' ~ (index + 1))) -%} {%- endif -%} {%- if aria_describedby|length > 0 -%} {%- set attr = attr|merge({'aria-describedby': aria_describedby|join(' ')}) -%} {%- endif -%} {%- if errors|length > 0 -%} {%- set attr = attr|merge({'aria-invalid': 'true'}) -%} {%- endif -%} ``` Could be: ```twig {%- set attr = {}|html_attr_merge( help ? { 'aria-describedby': [id ~ '_help'] }, errors|length ? { 'aria-invalid': true, 'aria-describedby': errors|map((_, index) => id ~ '_error' ~ (index + 1)) } ) -%} ``` https://github.com/symfony/symfony/blob/4a5d8cf03e1e31d1a7591921c6fa1fe7ec1c2015/src/Symfony/Bridge/Twig/Resources/views/Form/form_div_layout.html.twig#L470-L481 ```twig {% block attributes -%} {%- for attrname, attrvalue in attr -%} {{- ' ' -}} {%- if attrname in ['placeholder', 'title'] -%} {{- attrname }}="{{ translation_domain is same as(false) or attrvalue is null ? attrvalue : attrvalue|trans(attr_translation_parameters, translation_domain) }}" {%- elseif attrvalue is same as(true) -%} {{- attrname }}="{{ attrname }}" {%- elseif attrvalue is not same as(false) -%} {{- attrname }}="{{ attrvalue }}" {%- endif -%} {%- endfor -%} {%- endblock attributes -%} ``` This should basically be the same as `{{ html_attr(attr) }}`, ignoring edge cases for `null` values. Handling of the `translation_domain` might require a preceding `html_attr_merge` step to replace values with translations. Finally, ```twig {% set id = 'id value' %} {% set href = 'href value' %} {% set disabled = true %} <div {{ html_attr( { id, href }, disabled ? { 'aria-disabled': 'true' }, not disabled ? { 'aria-enabled' : true }, { class: ['zero', 'first'] }, { class: ['second'] }, true ? { class: 'third' }, { style: { color: 'red' } }, { style: { 'background-color': 'green' } }, { style: { color: 'blue' } }, { 'data-test': 'some value' }, { 'data-test': 'other value', 'data-bar': 'baz' }}, { 'dangerous=yes foo' : 'xss' }, { style: ['text-decoration: underline'] }, ) }}></div> ``` will generate HTML markup: ``` <div id="id value" href="href value" aria-disabled="true" class="zero first second third" style="color: red; background-color: green; color: blue; text-decoration: underline;" data-test="other value" data-bar="baz" dangerous=yes foo="xss"></div> ``` #### Details on `html_attr_merge` This filter merges an `attr` style array with one or several other arrays given as arguments. All of those arrays should reasonably be mappings, i. e. use keys that denote attribute names, and not sequences or lists with numeric keys. Empty arrays, empty strings or false values in the argument list will be ignored, which can be used to conditionally include values in the merge list like so: ```twig {% set attr = attr|html_attr_merge( condition ? { attrname: "attrvalue", other: "value" } ) %} ``` The merging of attribute values is similar to PHP's `array_merge` function. Latter (right) values generally override former (left) values, as follows: When two values are to be merged and both are either scalars or objects, the latter (right) value overrides the previous (left) one. When both values are iterables, `array_merge`/spread operator behavior is used: Numeric indices will be appended, whereas non-numeric ones will be replaced. This can be used to override designated elements in sets like CSS classes: ```twig {% set attr = { class: ['foo', 'bar'] }|html_attr_merge( { class: { importance: 'normal' } }, critical ? { class: { importance: 'high' } } ) %} ``` To provide more flexibility with regards to different merging strategies, the `MergeInterface` is provided as a flex point for advanced use cases of power users. * When an attribute value that represents a "right hand side" value has to be merged and implements `MergeInterface`, its `mergeInto()` method will be passed the previous (left hand side) value. That method will return the merge result. * Otherwise, when the "left hand side" value implements `MergeInterface`, `appendFrom()` will be passed the new (right hand side) value. Again, that method returns the merge result. Other combinations of values are rejected, an exception is thrown. This is a design decision to clearly and early notify users of combinations that might have unclear or unpredictable results, like merging a string like `'foo'` with an array like `['bar', 'baz']` for a `class` attribute – should this override, since one value is a scalar, or append, since the other one is an array? #### Details on `html_attr` The `html_attr()` function prints attribute-arrays as HTML. It will perform appropriate escaping of attribute names and values. One or several attribute arrays can be passed to `html_attr`, and `html_attr_merge` will be used first to merge those. In general, scalar attribute values (including the empty string `''`) will be printed as-is. For booleans and `null` values, the following extra rules apply: * For `aria-*`, boolean `true` and `false` will be coalesced to `"true"` and `"false"`, respectively. * For `data-*`, boolean `true` will be coalesced to `"true"`, and non-scalar values will be JSON-encoded * Otherwise, `false` and `null` attribute values will always omit printing of the attribute. * `true` values will print the attribute as `attributeName=""`. This is equivalent to printing `<... attributeName>`, but is X(HT)ML compliant. The user-agent will fill in the attribute's _empty default value_. These rules derive from the comparison provided at symfony/ux#3269 (comment) that shows how React and Vue as front-end frameworks behave in the same situation. For values that implement `AttributeValueInterface`, its `getValue(): ?string` method will be called first. The attribute will be omitted for a `null` return value, otherwise printed with the returned string. This interface could be used to provide (outside the scope of this PR or even outside Twig itself) classes that could e. g. help building more complex attribute values, like for image `srcset`. But the primary reason for adding it was to be able to deal with attributes values that are lists or hashes and need to be printed in different ways: * attributes like `class` or `aria-labelledby` use [space-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#space-separated-tokens) as their value * `srcset` or `sizes` for `<img>` use [comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#comma-separated-tokens) * other standards/extensions may have even other concepts `html_attr` will generally print array values as a space-separated list of values. A special case is the attribute name `style` if its value is an array. It will be converted to inline CSS. Values with numeric keys will be printed followed by a `;`. Non-numeric keys will print a pattern of `key: value;`. #### Details on `html_attr_type` The `html_attr_type` filter can be used to convert an array passed into it into implementations of the mentioned interfaces in a few predefined ways. It takes a single argument indicating the type to use – similar to the `escape` filter in Twig that knows about `html`, `js`, `css` and a few more. * `sst` means `space separated tokens` * `cst` means `comma separated tokens` * `style` means "inline CSS", for completeness So, the following will construct an `attr` array for an `img` tag, where `sizes` needs to be printed separated by commas. ```twig {% set attr = { srcset: ['small.jpg 480w']|html_attr_type('cst'), alt: 'A cute kitten' } %} {# amend the srcset #} {% set attr = attr|html_attr_merge({ srcset: ['medium.jpg 800w', 'large.jpg 1200w'] }) %} <img {{ html_attr(attr) /> ``` This works because the `SeparatedTokenList` that is used for `sst` and `cst` implements merge behavior where a given value can be extended by merging arrays. #### Design considerations Exposing behavior for different attribute types through these interfaces may not be the 100% perfect, nice, automagic solution. _But_ it has the big advantage that we are not committing ourselves to a particular list of attributes for which standards-specific knowledge would have to be put in the code. I would consider that a maintenance nightmare, since it would require us to decide which attribute/special case to support and which not. Every single change in that list would be BC-breaking. The two interfaces put control over that in the hands of power users or extension authors. Arbitrary ways could be conceived to create instances of these interfaces. The `html_attr_type` filter provides built-in access to the two types I see relevant in the HTML 5 standard, the space and comma separated token lists. The built-in default conversion of arrays to space-separated token lists should further reduce visibility of that problem for average template authors, who can hopefully ignore the problem most of the time. Closes #3907, which outlined the initial idea. #### TODO: - [x] Get initial feedback - [x] Add tests - [x] Concept for attributes that employ [comma-separated tokens](https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#comma-separated-tokens) - [x] Add documentation - [x] Add docblocks and type hints - [x] Make a decision: [Escape attribute names or not?](#3930 (comment)) Co-authored-by: `@polarbirke` Commits ------- 42c12fa Add an `html_attr` function to make outputting HTML attributes easier
This adds
html_attr_relaxed, a relaxed variant of thehtml_attrescaping strategy. The difference is thathtml_attr_relaxeddoes not escape the:,@,[and]characters. These are used by some front-end frameworks in attribute names to wire special handling/value binding. See https://v2.vuejs.org/v2/guide/syntax.html#v-bind-Shorthand for an example.The HTML 5 spec does not exclude all those characters from attribute names (html.spec.whatwg.org/multipage/syntax.html#attributes-2).
However, at least XML processors will treat the colon as the XML namespace separator.
HTML 5 allows XML only on SVG and MathML elements, and only for pre-defined namespace-prefixes (developer.mozilla.org/en-US/docs/Web/API/Attr/localName#:~:text=That means that the local,different from the qualified name). For other something: prefixes, these will simply be passed on as part of the local attribute name.
According to engine.sygnal.com/research/html5-attribute-names, all current browser implementations handle at least the colon fine, and the aforementioned Vue.js documentation suggests that this is also the case for @.
Note also that Symfony UX only conditionally escapes attribute names, and it has
:and@in its safe list:https://github.com/symfony/ux/blob/c9a3e66b8ac53e870097e8a828913e57204398e7/src/TwigComponent/src/ComponentAttributes.php#L82
Closes #3614.