Skip to content

Allow non-printable characters inside quoted string literals.#219

Merged
puzrin merged 1 commit intomasterfrom
unknown repository
Nov 21, 2015
Merged

Allow non-printable characters inside quoted string literals.#219
puzrin merged 1 commit intomasterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Nov 21, 2015

Fixed issue #192

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Nov 21, 2015

NB. Speed decrease is ~10% (25% on edge cases). Looks acceptable.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Nov 21, 2015

Looks good, but

  1. Why "JSON-style" input checking is disabled for single quoted scalars? I thought that according to the spec all quoted scalars should follow this exception.
  2. Performance probably could be improved by rewriting this regexp to something similar to JSON char check in the same function:
    if (checkJson) {
    for (_position = 0, _length = _result.length;
    _position < _length;
    _position += 1) {
    _character = _result.charCodeAt(_position);
    if (!(0x09 === _character ||
    0x20 <= _character && _character <= 0x10FFFF)) {
    throwError(state, 'expected valid JSON character');
    }
    }
    }

    It was written this way exactly becasue of performance. Manual codepoint-by-codepoint iteration is faster on small inputs than regexp test.

@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 21, 2015

Disabling of the JSON-style checking for single quoted scalars is a consequence of the following two statements from YAML 1.2 specification:
a) 5.7. Note that escape sequences are only interpreted in double-quoted scalars.
b) 5.1. To ensure readability, non-printable characters should be escaped on output, even inside such scalars.
If there is an non-printable character inside a single-quoted scalar, then after parsing and exporting it should be replaced by escape sequence because of b). But it is impossible because of a). So the only way to resolve this problem is to assume that non-printable characters are allowed only inside double-quoted scalars, but not single-quoted ones.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Nov 21, 2015

You're mixing two independent areas. 5.1 is dumper's concern. Dumper takes JS object and then decides by itself which scalar style to use. 5.7 is out of place at all. It's about \ escape sequences, no?

I'm just looking for usage of nb-json rule, and see that it's base for both nb-double-char and nb-single-char. See grammer rules in "7.3.1. Double-Quoted Style" and "7.3.2. Single-Quoted Style".

@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 21, 2015

OK, I've missed the section 7.3.2, so I'll revert the check for single-quoted scalars back.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Nov 21, 2015

@puzrin, if you think we don't need to try improve the performance, then It's good for merge now.

puzrin pushed a commit that referenced this pull request Nov 21, 2015
Allow non-printable characters inside quoted string literals.
@puzrin puzrin merged commit ea9e65a into nodeca:master Nov 21, 2015
@puzrin
Copy link
Copy Markdown
Member

puzrin commented Nov 21, 2015

I doubt that regexp unroll for this case will be faster - too many conditions. Let's leave it for someone else, who has time for experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants