-
Notifications
You must be signed in to change notification settings - Fork 743
adding wc unix command #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bytes is awhole other animal..will need another module and perhaps will do it in a later version
copied tail for now
documentation updated too
|
This just occurred to me...why create any new command implementations in javascript when the shelljs/exec command exists? and accomplish the same thing? |
|
In response to myself -- the time to execute these commands on windows nodejs are below. There is a clear lag when using shelljs\exec function, compared to both the shelljs\wc function I just made, and simply using nodejs' child process, as well as the regule wc executable provided in indows and ubuntu (not WSL) and ubuntu: |
Among other reasons,
Another reason is to make cross-platform implementations 😄 |
.gitignore
Outdated
| # OS X | ||
| .DS_Store | ||
|
|
||
| file1.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change gitignore
commands.js
Outdated
| 'uniq', | ||
| 'wc', | ||
| 'which', | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files should end in a newline
test/wc.js
Outdated
| .reverse() | ||
| .concat(bottomOfFile1.slice(0, 4).reverse()) | ||
| .join('\n') + '\n'); | ||
| }); No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files should end in a newline
|
|
||
| return wc.join('\n'); | ||
| } | ||
| module.exports = _wc; No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files should end in a newline
src/wc.js
Outdated
|
|
||
| var idx = 1; | ||
|
|
||
| /* - from tail.js, which I'm using as a template. Not sure I need this section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this section
src/wc.js
Outdated
| } | ||
|
|
||
| function getChars(data) { | ||
| return data.length.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually wrong (I'm looking at MDN):
uses a single 16-bit code unit to represent the most common characters, but needs to use two code units for less commonly-used characters, so it's possible for the value returned by length to not match the actual number of characters in the string.
Is there a way to guarantee that two-codepoint-long characters only count as a single character each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this a lot. It's been a while (in grad school time) so my mind has been elsewhere.
There is a complication in determining the actual encoding and how that is counted. I also wanted to implement the "bytes" option -- anyways, I'll relook this as I was assuming the .length property was sufficient.
I think i'll read the file into a buffer and work on the buffer (instead of reading it in as UTF-8, as it currently stand), possibly first determining it's encoding, then moving on from there. That might require another library, we shall see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more initial thoughts -- since i'm already reading this stream in as a utf-8, it should be in 'chunks' that made a 16-bit code unit into a character that .length property will count as one. Here is a demonstration, using the REPL:
> const buf5 = Buffer.from('tést');
undefined
> buf5
<Buffer 74 c3 a9 73 74>
> buf5.toString()
'tést'
> buf5.toString('utf-8')
'tést'
> buf5.toString('hex')
'74c3a97374'
> buf5.length
5
> buf5.toString().length
4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, based on this conversation, I think i'll implement the '--byte' option, which may be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to test with a different character. Try '🎉' instead. 'é' can be encoded in a single utf-16 code unit (2 bytes).
I think i'll implement the '--byte' option, which may be useful
Yeah, that's fine, as long as we get the correct bytes of the file, not the bytes of the JS string.
Also, I'm not sure what the correct behavior is when the file is utf-16 encoded. Unix wc doesn't seem to give correct answers.
src/wc.js
Outdated
| if (options.charCount || options.lineCount || options.wordCount) { | ||
| //modify wc accordingly | ||
|
|
||
| if (options.lineCount) temp += `${thisLines} `; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use regular strings (single quotes)
src/wc.js
Outdated
| 'w': 'wordCount', | ||
| */ | ||
|
|
||
| var temp = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better variable name might be output or formattedInfo
|
|
||
| if (moreThan1) wc.push(`${totalLines} ${totalWords} ${totalChars} total`); | ||
|
|
||
| return wc.join('\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to end this with a trailing newline. I think most other commands follow this pattern.
src/wc.js
Outdated
| files.unshift('-'); | ||
| } | ||
|
|
||
| var moreThan1 = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this variable, just inline files.length > 1 down below.
|
@nfischer thanks for the feedback, i'll probably make adjustments this weekend. (also, this is my first pull request so...noob errors will abound I'm sure) |
|
@jumson no worries, and thanks for the pull request! It looks pretty good overall. Also, I think |
deleted the README, then: npm run gendocs
initial man wc data deleted
duplication / inefficient way of doing things
formattedOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfischer, I think I've accounted for all changes now, except, regarding this comment:
Put a space after the comment character....
when I did that,
// @ docmentation stuff
npm run gendocs failed to work (returned no errors though).
When I put it back, it worked:
//@ documentation stuff
src/wc.js
Outdated
| var regexSplit = /\s/; | ||
| var wordsInitial = data.split(regexSplit); //array | ||
|
|
||
| while (wordsInitial.indexOf('') !== -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - certainly more elegant. I've not used the filer() before.
| Examples: | ||
|
|
||
| ```javascript | ||
| var counts = wc('file.txt'); //returns lineCount wordCount byteCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jumson What I mean is to change "//returns" into "// returns"
|
@jumson it looks like you have test failures, can you check those? |
nfischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI if you open src/wc.js in vim and save the file, it will automatically make sure it ends in a newline.
| .concat(bottomOfFile1.slice(0, 4).reverse()) | ||
| .join('\n') + '\n'); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra newline (just delete line 146)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #816 to catch this automatically. Once that PR lands, you can rebase off master and fix this with npm run lint -- --fix.
| return wc.join('\n'); | ||
| } | ||
|
|
||
| module.exports = _wc; No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs a newline at the end of line 169.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jumson actually, just run npm run lint to catch this stuff. npm run lint -- --fix will fix it.
This enforces that files should not end with blank lines (i.e. more than one trailing newline). The `eol-last` rule (inherited from airbnb's config) already enforces that files end in at least one trailing newline, so adding this rule enforces that it ends in exactly 1. See https://eslint.org/docs/rules/no-multiple-empty-lines and https://eslint.org/docs/rules/eol-last for more information. Inspired by #809.
* chore(lint): don't allow excess trailing newlines This enforces that files should not end with blank lines (i.e. more than one trailing newline). The `eol-last` rule (inherited from airbnb's config) already enforces that files end in at least one trailing newline, so adding this rule enforces that it ends in exactly 1. See https://eslint.org/docs/rules/no-multiple-empty-lines and https://eslint.org/docs/rules/eol-last for more information. Inspired by #809. * add maxBOF
|
@jumson are you still working on this? I think this is a better candidate for a shelljs plugin than a core method. Would you be interested in rewriting this in a separate repo as a plugin? |
|
Closing due to lack of response (also, I think this is better implemented as a 3rd party plugin). |
I added partial functionality of the Unix 'wc' command.
from the man page:
and I implemented these options:
I made a test file based on tail.js test, but I'm not sure how to use it.
I tested it alongside wc in bash and it gave similar results for single and multiple files.