Skip to content

Fix block scalar leading/trailing newlines#278

Merged
puzrin merged 3 commits intonodeca:masterfrom
aepsilon:fix-block-newlines
Apr 15, 2016
Merged

Fix block scalar leading/trailing newlines#278
puzrin merged 3 commits intonodeca:masterfrom
aepsilon:fix-block-newlines

Conversation

@aepsilon
Copy link
Copy Markdown
Contributor

@aepsilon aepsilon commented Apr 5, 2016

Fixes #253: loader should not add an extra newline at the beginning of block scalars that have an explicit indentation indicator.
Cause: "Detected indent" was assumed logically equivalent to "did read non-empty line".
This no longer holds with an indentation indicator: indent is detected before any content is read.

Fixes #276: loader should not remove a trailing newline for block scalars with "+" chomping.
Cause: "+" chomping didn't account for the last content newline before the trailing empty lines.


I'm not 100% familiar with the loader so there may be some missed edge cases.
So far it's passed every test I've tried.
@dervus and @puzrin, I hope this helps. Let me know if you find something.

@aepsilon
Copy link
Copy Markdown
Contributor Author

aepsilon commented Apr 6, 2016

Added a hotfix for double-quoted.
I was too focused on the other styles to notice, sorry. There's now a test case to cover it.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 12, 2016

Sorry for so delayed reply!

Could you make didReadContent a Boolean variable? You use it for +1 operation anyway, and using integers in Boolean expressions is against our coding style.

Also, this fixes only first part of #253, am I right?

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 12, 2016

if didReadContent become boolean, equation will be common.repeat('\n', didReadContent ? emptyLines + 1 : emptyLines)

@aepsilon
Copy link
Copy Markdown
Contributor Author

I prefer boolean too for style. Originally I wrote it like you said.
But it turns out auto-converting boolean to number is slow in V8, so I changed it for performance.
The added benchmarks bear this out.

There's another neat solution though: represent '\n' directly rather than converting it from 1 or true each time. The new commit uses this method to avoid almost all conversions and branching. It has the best performance and, I believe, good readability. What do you think?

Benchmarks

1/0 as boolean and number (old commit)
Sample: scalar_block_folded.yaml (2785 characters)
 > current x 39,255 ops/sec ±1.54% (76 runs sampled)
Sample: scalar_block_literal.yaml (2785 characters)
 > current x 37,874 ops/sec ±0.78% (86 runs sampled)

true/false as boolean and number
Sample: scalar_block_folded.yaml (2785 characters)
 > current x 38,813 ops/sec ±1.23% (79 runs sampled)
Sample: scalar_block_literal.yaml (2785 characters)
 > current x 33,871 ops/sec ±1.00% (85 runs sampled)

ternary (didReadContent ? emptyLines + 1 : emptyLines))
Sample: scalar_block_folded.yaml (2785 characters)
 > current x 39,068 ops/sec ±0.93% (78 runs sampled)
Sample: scalar_block_literal.yaml (2785 characters)
 > current x 35,470 ops/sec ±0.89% (87 runs sampled)

'\n'/'' (new commit)
Sample: scalar_block_folded.yaml (2785 characters)
 > current x 38,504 ops/sec ±0.99% (75 runs sampled)
Sample: scalar_block_literal.yaml (2785 characters)
 > current x 38,513 ops/sec ±0.61% (86 runs sampled)

For part 2 of #253, it appears the spec has an error.

' \t' is a more-indented line according to the grammar, since \t is s-white.

[177] s-nb-spaced-text(n) ::= s-indent(n) s-white nb-char*

(Rule [175] for folded lines doesn't match)
Therefore join with a newline.

LibYAML 0.1.5 agrees.
It throws an error on the example, but if you specify indent with>1, it also produces '\t\ndetected\n'.

Comment thread lib/js-yaml/loader.js Outdated
// Just one line break - perceive as the same line.
} else if (emptyLines === 0) {
if (didReadContent) { // i.e. only if we have already read some scalar content.
if (contentNewline) { // i.e. only if we have already read some scalar content.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (contentNewline.length) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or !== ''. I'd expect all of them to have about the same performance, since the type of contentNewline is fully determined as non-null string at compile time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we avoid indirect type casts and string operations when possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@aepsilon aepsilon force-pushed the fix-block-newlines branch from a7fce25 to 0d7af25 Compare April 14, 2016 14:55
@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 14, 2016

It doesn't seem to really affect the benchmarks in my opinion. The difference seem to be in the margin of error. I'd propose to stick to Boolean solution @puzrin proposed.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 15, 2016

@dervus any more notes? Ok to squash and merge?

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 15, 2016

Yes, looks good to me. Thanks, @aepsilon.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 15, 2016

@aepsilon could you squash 2 last commits into the first one?

Fixes nodeca#253: loader should not add an extra newline at the beginning
of block scalars that have an explicit indentation indicator.

Cause: "Detected indent" was assumed logically equivalent to
"did read non-empty line". This no longer holds with an
indentation indicator: indent is detected before any content is read.

Fixes nodeca#276: loader should not remove a trailing newline
for block scalars with "+" chomping.

Cause: "+" chomping didn't account for the last content newline
before the trailing empty lines.
Taken from scalar_double_quoted_simple_multiline.yaml
@aepsilon aepsilon force-pushed the fix-block-newlines branch from 8807e73 to 10d82a2 Compare April 15, 2016 14:04
@aepsilon
Copy link
Copy Markdown
Contributor Author

Done. 👍 Thanks for checking it over.

@aepsilon aepsilon changed the title Fix block style extra/missing newlines Fix block scalar leading/trailing newlines Apr 15, 2016
@puzrin puzrin merged commit fa11316 into nodeca:master Apr 15, 2016
@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 15, 2016

Seems ok to release.

3.6.0 or 4.0.0? Can it break something?

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 15, 2016

Well, it's not an API change, though it changes dumper output. However I saw some NodeJS folks talking about changes of dump output. It was somewhere at PRs/issues external references.

I'd bet for minor version change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants