Skip to content

Conversation

@jumson
Copy link

@jumson jumson commented Jan 6, 2018

I added partial functionality of the Unix 'wc' command.
from the man page:

wc - Print newline, word, and byte counts for each FILE, 
and a total line if more than one FILE is specified.  A word 
is a non-zero-length sequence of characters delimited by 
white space.

and I implemented these options:

        -m, print the character counts
        -l, print the newline counts
        -w, print the word counts

I made a test file based on tail.js test, but I'm not sure how to use it.

I tested it alongside wc in bash and it gave similar results for single and multiple files.

@jumson
Copy link
Author

jumson commented Jan 7, 2018

This just occurred to me...why create any new command implementations in javascript when the shelljs/exec command exists?
regarding this pull request, does not:

shell.wc('file.txt')

and

shell.exec('wc file.txt')

accomplish the same thing?

@jumson
Copy link
Author

jumson commented Jan 7, 2018

In response to myself -- the time to execute these commands on windows nodejs are below. There is a clear lag when using shelljs\exec function, compared to both the shelljs\wc function I just made, and simply using nodejs' child process, as well as the regule wc executable provided in indows and ubuntu (not WSL)


PS C:\Users\xx> Measure-Command {wc .\notes-uname.txt}
TotalSeconds      : 0.0386133
TotalMilliseconds : 38.6133

PS C:\Users\xx> Measure-Command {node .\child.js}
TotalSeconds      : 1.1918551
TotalMilliseconds : 1191.8551

PS C:\Users\xx> Measure-Command {node .\e.js}
TotalSeconds      : 1.2602646
TotalMilliseconds : 1260.2646

PS C:\Users\xx> Measure-Command {node .\wc.js}
TotalSeconds      : 0.2618337
TotalMilliseconds : 261.8337

and ubuntu:

root@xx# time wc notes-uname.txt
  39  131 1244 notes-uname.txt
real    0m0.020s
user    0m0.000s
sys     0m0.004s

root@xx# time node child.js
real    0m0.266s
user    0m0.116s
sys     0m0.020s

root@xx# time node e.js
real    0m1.216s
user    0m0.284s
sys     0m0.096s

root@xx# time node wc.js
real    0m1.020s
user    0m0.216s
sys     0m0.032s

@nfischer
Copy link
Member

There is a clear lag when using shelljs\exec function

Among other reasons, shell.exec() needs to execute a system call (something like execvp), and that has associated overhead. JavaScript implementations don't need that particular syscall, which gives a slight advantage.

why create any new command implementations in javascript

Another reason is to make cross-platform implementations 😄

.gitignore Outdated
# OS X
.DS_Store

file1.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change gitignore

commands.js Outdated
'uniq',
'wc',
'which',
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should end in a newline

test/wc.js Outdated
.reverse()
.concat(bottomOfFile1.slice(0, 4).reverse())
.join('\n') + '\n');
}); No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should end in a newline


return wc.join('\n');
}
module.exports = _wc; No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should end in a newline

src/wc.js Outdated

var idx = 1;

/* - from tail.js, which I'm using as a template. Not sure I need this section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this section

src/wc.js Outdated
}

function getChars(data) {
return data.length.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually wrong (I'm looking at MDN):

uses a single 16-bit code unit to represent the most common characters, but needs to use two code units for less commonly-used characters, so it's possible for the value returned by length to not match the actual number of characters in the string.

Is there a way to guarantee that two-codepoint-long characters only count as a single character each?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this a lot. It's been a while (in grad school time) so my mind has been elsewhere.
There is a complication in determining the actual encoding and how that is counted. I also wanted to implement the "bytes" option -- anyways, I'll relook this as I was assuming the .length property was sufficient.

I think i'll read the file into a buffer and work on the buffer (instead of reading it in as UTF-8, as it currently stand), possibly first determining it's encoding, then moving on from there. That might require another library, we shall see.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more initial thoughts -- since i'm already reading this stream in as a utf-8, it should be in 'chunks' that made a 16-bit code unit into a character that .length property will count as one. Here is a demonstration, using the REPL:

> const buf5 = Buffer.from('tést');
undefined
> buf5
<Buffer 74 c3 a9 73 74>
> buf5.toString()
'tést'
> buf5.toString('utf-8')
'tést'
> buf5.toString('hex')
'74c3a97374'
> buf5.length
5
> buf5.toString().length
4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, based on this conversation, I think i'll implement the '--byte' option, which may be useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to test with a different character. Try '🎉' instead. 'é' can be encoded in a single utf-16 code unit (2 bytes).

I think i'll implement the '--byte' option, which may be useful

Yeah, that's fine, as long as we get the correct bytes of the file, not the bytes of the JS string.

Also, I'm not sure what the correct behavior is when the file is utf-16 encoded. Unix wc doesn't seem to give correct answers.

src/wc.js Outdated
if (options.charCount || options.lineCount || options.wordCount) {
//modify wc accordingly

if (options.lineCount) temp += `${thisLines} `;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use regular strings (single quotes)

src/wc.js Outdated
'w': 'wordCount',
*/

var temp = '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better variable name might be output or formattedInfo


if (moreThan1) wc.push(`${totalLines} ${totalWords} ${totalChars} total`);

return wc.join('\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to end this with a trailing newline. I think most other commands follow this pattern.

src/wc.js Outdated
files.unshift('-');
}

var moreThan1 = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this variable, just inline files.length > 1 down below.

@jumson
Copy link
Author

jumson commented Jan 11, 2018

@nfischer thanks for the feedback, i'll probably make adjustments this weekend. (also, this is my first pull request so...noob errors will abound I'm sure)

@nfischer
Copy link
Member

@jumson no worries, and thanks for the pull request! It looks pretty good overall.

Also, I think npm run lint will flag some style issues.

deleted the README, then:
npm run gendocs
initial man wc data deleted
duplication / inefficient way of doing things
formattedOutput
Copy link
Author

@jumson jumson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfischer, I think I've accounted for all changes now, except, regarding this comment:

Put a space after the comment character....

when I did that,

// @ docmentation stuff

npm run gendocs failed to work (returned no errors though).
When I put it back, it worked:

//@ documentation stuff

src/wc.js Outdated
var regexSplit = /\s/;
var wordsInitial = data.split(regexSplit); //array

while (wordsInitial.indexOf('') !== -1) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - certainly more elegant. I've not used the filer() before.

Examples:

```javascript
var counts = wc('file.txt'); //returns lineCount wordCount byteCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jumson What I mean is to change "//returns" into "// returns"

@nfischer
Copy link
Member

@jumson it looks like you have test failures, can you check those?

Copy link
Member

@nfischer nfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI if you open src/wc.js in vim and save the file, it will automatically make sure it ends in a newline.

.concat(bottomOfFile1.slice(0, 4).reverse())
.join('\n') + '\n');
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra newline (just delete line 146)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #816 to catch this automatically. Once that PR lands, you can rebase off master and fix this with npm run lint -- --fix.

return wc.join('\n');
}

module.exports = _wc; No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs a newline at the end of line 169.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jumson actually, just run npm run lint to catch this stuff. npm run lint -- --fix will fix it.

nfischer added a commit that referenced this pull request Jan 18, 2018
This enforces that files should not end with blank lines (i.e. more than
one trailing newline). The `eol-last` rule (inherited from airbnb's
config) already enforces that files end in at least one trailing
newline, so adding this rule enforces that it ends in exactly 1.

See https://eslint.org/docs/rules/no-multiple-empty-lines and
https://eslint.org/docs/rules/eol-last for more information.

Inspired by #809.
nfischer added a commit that referenced this pull request Jan 25, 2018
* chore(lint): don't allow excess trailing newlines

This enforces that files should not end with blank lines (i.e. more than
one trailing newline). The `eol-last` rule (inherited from airbnb's
config) already enforces that files end in at least one trailing
newline, so adding this rule enforces that it ends in exactly 1.

See https://eslint.org/docs/rules/no-multiple-empty-lines and
https://eslint.org/docs/rules/eol-last for more information.

Inspired by #809.

* add maxBOF
@nfischer
Copy link
Member

nfischer commented Jul 4, 2018

@jumson are you still working on this?

I think this is a better candidate for a shelljs plugin than a core method. Would you be interested in rewriting this in a separate repo as a plugin?

@nfischer
Copy link
Member

Closing due to lack of response (also, I think this is better implemented as a 3rd party plugin).

@nfischer nfischer closed this Jul 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants