Skip to content

Adds literal, folded, and single-quote scalar dump#181

Merged
dervus merged 1 commit intonodeca:masterfrom
isaacs:scalardump
Apr 22, 2015
Merged

Adds literal, folded, and single-quote scalar dump#181
dervus merged 1 commit intonodeca:masterfrom
isaacs:scalardump

Conversation

@isaacs
Copy link
Copy Markdown
Contributor

@isaacs isaacs commented Apr 20, 2015

Expands the ways that string values can be dumped to yaml.

Use simple scalar when possible, as before.

If simple is not possible, but single-quoted is allowed (ie, no
unprintables or escaped characters), then use that.

If there are linebreaks, and no lines have trailing or leading spaces,
then use folded or literal. Prefer literal if all lines are below a
reasonable limit.

If neither literal nor folded are possible, then fall back to
double-quoted.

Fixes #66

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 20, 2015

review, please: @nodeca

The StringBuilder class is probably unnecessary, since it turned out that it was simpler to just use booleans for the other dump types. But I left it in since it's a bit nicer way to organize that logic. I'm happy to refactor it out if you feel it's not worth it.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 20, 2015

How about speed change of new dumper? I don't ask about every case, only about situation in general.

/cc @dervus please take a look

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 20, 2015

@puzrin Do you have a benchmark script that it could be tested against?

I try to avoid making optimizations without having some way of measuring specific impact. It's your project, so I'm happy to polish it as much as you'd like. But most of the "slow path" things will only start to be slow once you have a very long multi-line string, which we already have buffered in memory, and the use of slice and string appending takes advantage of the cons string capabilities in V8. Comparing characters vs charcodes is not really a significant difference; it's only 0.39% faster in my micro-benchmarks, on an operation that can be performed at about 10,000 ops/ms on my machine. A slightly greater gain could be done by simply moving the character code vars inline in the functions that use them, rather than defining them at the module-level. (Also, those should be consts, which will get us some more speed improvements, at the cost of losing older runtimes. Maybe a TODO for later?)

Much greater gains could be made by tracking down deopts and inline opportunities.

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 20, 2015

Sorry, I was measuring wrong. Not 10,000 ops/ms. 200,000 ops/ms.

Character and integer comparisons are crazy fast in v8!

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 20, 2015

@isaacs there are no separate dumper benchmark (nobody cared much). We have only https://github.com/nodeca/js-yaml/tree/master/benchmark and extend it when needed.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 20, 2015

@isaacs i've finished my review pass (generic design/logic). LGTM. It worth to wait @dervus, because he is more experienced in yaml specs. He told me he will look carefully tommorow.

Please, squash into single commit.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 20, 2015

The first thing I noticed is the test. Most likely you want to create individual test for that, since test/samples-common do not check formatting; just values of load and dump-load. Will take a look at the rest tomorrow. Thanks for such massive PR anyway!

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 20, 2015

IMO it should be test/units/dump-scalar-styles.js

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 21, 2015

@dervus @puzrin Thank you both for the review. I'm looking forward to any new stuff you find :)

I've updated a few more things, and added a unit test.

Comment thread lib/js-yaml/dumper.js Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use single var statement for the whole block of assignments. See any other place in our code for a reference.

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 21, 2015

@dervus Thanks, I'll update those style points soon.

Aside on that, would it be worthwhile to use a linter that catches these kinds of things as well? I think jshint and eslint are both pretty popular, and have more options. Jslint is a bit behind the times feature-wise, I think.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 21, 2015

@isaacs i've already migrated most of projects to eslint. This happens when repo activity rises. Will update js-yaml too after your PR megre.

Comment thread lib/js-yaml/dumper.js Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called "plain" in YAML, but if you think "simple" is better, I do not insist.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 21, 2015

Other than style issues it seems okay for me.

@isaacs isaacs force-pushed the scalardump branch 2 times, most recently from 448c5bd to 2e95e1f Compare April 21, 2015 22:27
@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 21, 2015

@dervus That CHAR_SPACE || CHAR_QUESTION thing was added in b946fcd by @puzrin to close #104. I'm not sure what cases it was intended to retain quoting for, but I didn't want to change behavior for that situation.

Here's some edge cases that might be relevant:

> require('./').safeLoad('---asdf')
'asdf'
> require('./').safeLoad('--asdf')
'--asdf'
> require('./').safeLoad('--- asdf')
'asdf'
> require('./').safeLoad('??? asdf')
'??? asdf'
> require('./').safeLoad('??? --- asdf')
'??? --- asdf'
> require('./').safeLoad('? --- asdf')
{ '--- asdf': null }

It seems like quoting things that start with ? or - is prudent, since it might be some magic yaml stuff. Perhaps I was misunderstanding the intent, though, and the name "maybeNum" is actually the only problem. I'll rename it to maybeYamlMagic or something :)

Expands the ways that string values can be dumped to yaml.

Use simple scalar when possible, as before.

If simple is not possible, but single-quoted is allowed (ie, no
unprintables or escaped characters), then use that.

If there are linebreaks, and no lines have trailing or leading spaces,
then use folded or literal.  Prefer literal if all lines are below a
reasonable limit.

If neither literal nor folded are possible, then fall back to
double-quoted.

Fixes nodeca#66
@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 21, 2015

@dervus Simplified the confusing var name, and fixed the style issues you pointed out. Should be good to go.

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 21, 2015

The only magic here is the fact that we just used the simplest and dumbiest solution for quickly solve the problem and go on. :D Here I'll duplicate the link there you can see how it should be handled: http://www.yaml.org/spec/1.2/spec.html#ns-plain-first%28c%29

@isaacs
Copy link
Copy Markdown
Contributor Author

isaacs commented Apr 21, 2015

@dervus Ah, ok, that makes a lot of sense.

In that case, yes, there is still an opportunity to do more careful string examination here, if there is a goal to avoid quoting in more cases that could potentially be dumped as plain scalars. For example, "a!b" should be a valid scalar, but as of this pull req will be quoted.

I think we can leave that for the next adventure. The behavior here is not incorrect, and an overall improvement from what was there before.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 21, 2015

@dervus , merge?

dervus added a commit that referenced this pull request Apr 22, 2015
Adds literal, folded, and single-quote scalar dump
@dervus dervus merged commit e956f91 into nodeca:master Apr 22, 2015
@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 22, 2015

Yeah, there is no much need for making strings plain by any cost. This PR is about block scalars anyway. So thanks for contribution!

@dervus
Copy link
Copy Markdown
Collaborator

dervus commented Apr 22, 2015

@isaacs Sorry, I have mislead you about two lines between top-level file statements. Seems like we dropped this style, and I forgot about it. I've corrected that.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 22, 2015

I've finished other changesets, but suggest to wait 1-2 days before release, if not urgent. May be someone remember more breaking/pending changes.

@puzrin
Copy link
Copy Markdown
Member

puzrin commented Apr 26, 2015

Released 3.3.0. Thanks again for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[writer] Implement full-weight scalar style resolver.

3 participants