PR6695: Do not treat paths as encoded in ISO-8859-1#124
PR6695: Do not treat paths as encoded in ISO-8859-1#124whitequark wants to merge 6 commits intoocaml:trunkfrom whitequark:utf8-paths
Conversation
|
I think the merge of "Make sure the compiler only uses ASCII" would be a no-brainer if it didn't imply changing the stdlib, which is always more controversial (as it should be). I'd merge a commit with the ascii_foo functions in utils/misc (and let people discuss a second commit that moves them to the stdlib directly). Unfortunately, that would require duplicating them in ocamlbuild/my_std as well, as we don't want ocamlbuild to start depending on internal compiler libraries. |
|
I forgot to say: I do support adding the ascii_* to the stdlib (I would personally have just changed the semantics of the existing String....) and have reviewed the patch, I think it is fine. (We may want to expose |
|
@gasche Yes, that was the reason I did not put them in Misc. Should I move/duplicate them or wait for more input? |
|
It might be useful to check that no OPAM package uses compilation units whose name starts with a non-ASCII letter. I assume and hope the answer is no. |
Well it's as you prefer. If you value the "at least the compiler is fixed in trunk now" aspect enough to put micro-work in splitting the commits, you want to do that. Otherwise, just wait -- but be patient, these things can stay silent for months. Both are fine options. |
|
@alainfrisch The answer is actually no. ocamldep does not print dependencies whose names start with non- Additionally, since Ubuntu, Github et al use UTF-8 and the compiler uses Latin-1, even if it did not, that would still fail the compilation. |
|
@gasche I will put them in Misc. This will make it easier to decide on how/if change the semantics of String. |
|
I'm in favor of adding them to stdlib, but we need to discuss the interface:
|
|
I think adding optional arguments is a bad way to evolve APIs. In this particular case, the default value would be the bad value for this parameter. |
|
@gasche If that's the bad value for the optional parameter it means that you ready to break compatibility ? If it's ok to break backward compatibility then why not move the old functions with the same name to a deprecated Another alternative (which I'm not very fond of) would be to have a full copy of the old |
|
Oh btw. adding optional parameters to an API is an interface breaking change anyways. |
|
I think the required change is by far not invasive enough to warrant a full copy. In any case it is never worse than adding Personally I prefer changing the behavior of existing functions, given that it seems that breaking backwards compatibility is inevitable. |
|
If we're going to break backwards compatibility, why not break it in the same way it was broken for We could also add a The benefit of this approach is that we use types to properly separate latin1 encoded strings from ascii encoded strings -- which should hopefully make bugs around incorrect use of the encodings less likely. |
|
@lpw25 I think this is the best solution. We also know migration pitfalls reasonably well. |
|
@whitequark this as in this PR or this as in @lpw25 ? I personally think that @lpw25's solution is overkill for four functions acting on a legacy charset. If backward compatibility should not be broken on that I prefer deprecation of the current functions and the |
|
Thirded. I think the current patch is optimal (I would maybe add |
This updates Char, String, Bytes in the stdlib. For now, they are hidden from documentation and are only for internal compiler use.
|
If you also marked the legacy functions as deprecated this patch would solve Mantis PR#6695 as well. |
|
@gasche Did you mean PR6694? |
|
Yes indeed. |
This should cover all places involving filenames in the compiler. There are a few more paths still using Latin-1 in other ways, e.g. in ocamldoc.
The only place that includes changes is the code for checking the suffix. It is highly unlikely that the change has any impact at all.
Also, add documentation for the US-ASCII variants.
|
Updated. The deprecation warnings actually uncovered a bug (a stray lowercase in Filename), and added some fallout, mainly in Str. |
|
Merged in trunk. Thanks! |
|
@dbuenzli Already done in trunk. |
See http://caml.inria.fr/mantis/view.php?id=6695.