Skip to content

A file open with auto-detected encoding.#10013

Closed
tomoki1207 wants to merge 2 commits intomicrosoft:masterfrom
tomoki1207:auto-detect-encoding
Closed

A file open with auto-detected encoding.#10013
tomoki1207 wants to merge 2 commits intomicrosoft:masterfrom
tomoki1207:auto-detect-encoding

Conversation

@tomoki1207
Copy link
Contributor

@tomoki1207 tomoki1207 commented Aug 1, 2016

This related to #5388.

A text file will opened with detected encoding by jschardet.

@msftclas
Copy link

msftclas commented Aug 1, 2016

Hi @tomoki1207, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

@msftclas
Copy link

msftclas commented Aug 1, 2016

@tomoki1207, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

@bpasero
Copy link
Member

bpasero commented Aug 6, 2016

@tomoki1207 I am not sure this works the way you coded it because the encoding is a user setting and you always try to detect the encoding now. How can you still respect the user preference if the encoding is not clear?

My argument is that really the only way of detecting an encoding is by looking at the BOM (Byte Order Mark) for UTF (and we do this already). Any other file encoding can only be guessed.

I think one thing we could add is an action in the encoding picker to "Auto Detect" the encoding via this code and then set the encoding for the file. But always detecting the encoding for each file being opened is not right imho.

@bpasero bpasero added this to the Backlog milestone Aug 6, 2016
@tomoki1207
Copy link
Contributor Author

@bpasero I understand your opinion.
However, I think many people are feeling a lot of inconvenience to open small files that are not encoded in UTF.
So, I hope to be detected automatically the encoding in some way.

Do you better the following approach? Just like Atom auto detect package.

  1. Prepare SetEncoding API for Extension
  2. Call it from auto-detect extension

@bpasero
Copy link
Member

bpasero commented Aug 9, 2016

@tomoki1207 the approach works if every encoding can be detected with 100% certainty but I doubt that is possible for any file that does not include a BOM. What does jschardet do if the encoding is ambiguous?

Nevertheless we do have a global and workspace setting for the encoding that we cannot just drop, so I see little chance of changing this to always auto detect the encoding. The only possible thing I see is to offer an action to "Guess Encoding" from the encoding picker that executes the jschardet. I believe Atom does the same.

@bpasero
Copy link
Member

bpasero commented Aug 23, 2016

Closing for inactivity.

@bpasero bpasero closed this Aug 23, 2016
@bpasero bpasero removed their assignment Aug 23, 2016
@buzzzzer buzzzzer mentioned this pull request Apr 11, 2017
3 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants