-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Rewrite it in Rust #9512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite it in Rust #9512
Conversation
|
So, a few quick comments:
Cygwin is mostly a major PITA because of its wchar_t type. If we can get away from using libc that could make some things easier. |
Rust uses libc under the hood. |
Of course technically that's true, but that's not what I'm talking about. What I mean is that we wouldn't use e.g. libc's "iswdigit" and friends, like we're forced to do now because C++ never got around to replacing them with ones that work with its strings. And because we use those functions, we're beholden to the libc wchar_t type where we use them, which is everywhere, which means the difference between cygwin (2 byte wchar_t) and as best as I can tell any other system (4 byte wchar_t) can appear everywhere. And we're beholden to platform differences like "does this provide a uselocale" or "does this have wcscasecmp or std::wcscasecmp", because those directly appear in libc. In effect what I'm saying is that my hope is rust would not rely on libc for everything and abstract some of the other libc gunk away instead of throwing it in our face like C++ does. And so this historical wchar_t difference could just go away. |
|
The progress to date is impressive, and the motivation to allow concurrent mode is compelling. I agree that packaging is important but I'd be happy to tackle it. RHEL 7 is only supported for another 18 months, and I have managed to backport GCC to RHEL in the past so it might be possible to do the same. There's a couple of ways to generate Debian packages, which are worth exploring. I don't know any rust, but I didn't know any C++ before I started work on fish, so I'd be keen to learn. |
|
(Maybe a way forward to #972 etc?) |
| @@ -0,0 +1,48 @@ | |||
| include(FetchContent) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, distro packagers are not going to like this, but it's fine if you try find_package first:
find_package(Corrosion)
if (NOT Corrosion_FOUND)
include(FetchContent)
...
endif()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, distro packagers are not going to like this, but it's fine if you try
find_packagefirst:find_package(Corrosion) if (NOT Corrosion_FOUND) include(FetchContent) ... endif()
Distro packager for Arch Linux here, this would make our lives so much easier with the ability to use system libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to add that, but my hope is that no distro packages fish while it uses Corrosion. This use is meant to be temporary during a single development cycle; by the next release I hope to have no CMake at all, and therefore no Corrosion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an ambitious goal. Good luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?
Corrosion can be added as a subdirectory/ submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the use of a fork of Corrosion, and that it is intended as a temporary measure that won't be released, I think using Corrosion as a subdirectory would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, throwing myself into this conversation with some additional info that distro maintainers might find useful. FetchContent has always supported placing the source directory for any dependency in any location and then setting -DFETCHCONTENT_SOURCE_DIR_<uppercase-name>=<path>.
In this case, this would allow package maintainers to do -DFETCHCONTENT_SOURCE_DIR_CORROSION=<path/to/corrosion/sources> and not interrupt the workflow. No conditional find_package call is required.
Additionally, starting in CMake 3.24, it's possible to tell FetchContent_MakeAvailable to call find_package first (and vice versa where find_package calls FetchContent_MakeAvailable if no file was found).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is helpful. For Corrosion, my preferences are (in order): use FetchContent as in the PR (perhaps using bruxisma's tips), or add Corrosion as a fish-shell submodule, or directly check in Corrosion. Not sure what's best for OBS.
We shouldn't use heroics to keep fish working on older platforms mid-port. That may mean turning off builders for certain platforms, and then turning them back on later when we're no longer obligated to support Corrosion/autocxx/etc.
| crate-type=["staticlib"] | ||
|
|
||
| [patch.crates-io] | ||
| cxx = { git = "https://github.com/ridiculousfish/cxx", branch = "fish" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, all this is going to require internet access from build hosts, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we need to tell cargo to pre-fetch all dependencies, like https://kressle.in/articles/2021/packaging-rust-apps-in-obs.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBS can potentially do a lot of the heavy lifting - https://en.opensuse.org/openSUSE:Packaging_Rust_Software
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo can download all dependencies' source code prior to building. See cargo fetch and cargo vendor. The [patch] table in Cargo.toml is an easy way to tell Cargo to use a repository you've forked, which is very convenient before your changes are merged upstream.
|
|
||
| - Rust (version 1.67 or later) | ||
| - a C++11 compiler (g++ 4.8 or later, or clang 3.3 or later) | ||
| - CMake (version 3.5 or later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of FindContent bumps CMake to 3.11 (unless corrosion gets included in the tree as discussed elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current versions of corrosion require at least CMake 3.15. Some features are only available with CMake 3.19 and newer.
|
|
||
| ## Risks | ||
|
|
||
| - Large amount of work with possible introduction of new bugs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, thanks for spearheading this work.
The plan sounds all-around sensible to me.
The port sounds like a great investment while being easier to pull off than a rewrite.
There are many fresh shells these days but I don't think they offer a comparable interactive experience (most of them focus on inventing their language).
Still, the port is a lot of work with late payoff, so it might be hard to find persistent contributors.
Somehow Rust projects seem to be doing fine in this regard.
Granted most of them are written from scratch.
One incremental port that died a slow death is https://github.com/remacs/remacs - but I don't think it's comparable to fish, it's around 6x larger, and the authors didn't have ownership of the code. IIRC they used a simple bindgen build; our toolchain seems much more powerful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"might be hard to find persistent contributions"
Just wanted to chime in and say that I love the fish shell and have been using it every day for about 6 or 7 years now. If this goes through, I will absolutely be contributing to the ongoing port and future features. I'm a full time Rust engineer and my main project at work includes quite a lot of C FFI, so I'm no stranger to that. I know I'm just one person, but figured I'd show support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That remacs link is sobering, shanks for sharing that. I think the biggest difference is that emacs maintainers don't appear to have been involved, and the fork was scared of touching parts of the codebase.
fish gets a lot of mileage out of being written in C++ here: the types map better, the FFI tools are more powerful, and fish already uses techniques like RAII, scoped locks, shared_ptr etc which map cleanly to Rust. Still, food for thought.
| FetchContent_Declare( | ||
| Corrosion | ||
| GIT_REPOSITORY https://github.com/ridiculousfish/corrosion | ||
| GIT_TAG fish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we want to have a SHA here, for reproducibility
| - Other languages considered: | ||
| - Java, Python and the scripting family are ruled out for startup latency and memory usage reasons. | ||
| - Go would be an awkward fit. fork is [quite the problem](https://stackoverflow.com/questions/28370646/how-do-i-fork-a-go-process/28371586#28371586) in Go. | ||
| - Other system languages (D, Nim, Zig...) are too niche: fewer contributors, higher risk of the language becoming irrelevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FWIW Java startup time can be fixed by compiling a native image)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edit for posterity: This was in response to a comment asking us to reconsider D, which has since been removed by the author)
Let me put a swift lid on that (and any other discussions of other programming languages, including swift).
Rust has the momentum, some of us already know it and others are interested in it.
This exists, it is reasonably far along, it works. The decision has essentially been made.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| ## Timeline | ||
|
|
||
| Handwaving, 6 months? Frankly unknown - there's 102 remaining .cpp files of various lengths. It'll go faster as we get better at it. Peter (ridiculous_fish) is motivated to work on this, other current contributors have some Rust as well, and we may also get new contributors from the Rust community. Part of the point is to make contribution easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm eager to contribute as well, probably at modest capacity.
Seems like you have already overcome the biggest initial hurdles.
I'm always motivated to help people get up to speed with fish and Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me 3 hours to port util.cpp, which has 200 lines.
With my new knowledge I could probably do it in 1 hour.
Extrapolating the 1-hour estimate to the remaining 60k lines gives us some 300 hours of work.
I'm sure that other modules are harder and might take longer but this number leaves me optimistic.
| The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful: | ||
|
|
||
| 1. Install cargo expand (`cargo install cargo-expand`). Then you can use `cargo expand` to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle. | ||
| 2. In rust-analyzer, enable Proc Macro and Proc Macro Attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowadays rust-analyzer enables proc macros by default (but LSP clients may override the default)
| let fish_src_dir = format!("{}/{}", rust_dir, "../src/"); | ||
|
|
||
| // Where cxx emits its header. | ||
| let cxx_include_dir = format!("{}/{}", target_dir, "cxxbridge/rust/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this may work, I warn you that this relies on unstable implementation details of Cargo and the cxx maintainer has adamantly refused to accept any more reasonable solution for integrating cxx with any C++ build system besides Bazel or Buck: dtolnay/cxx#462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads-up. I doubt fish will track upstream autocxx: its priorities are somewhat different (focused on long-term safe interop, instead of an aid to porting) so we'll continue to use our fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that you are using a fork of cxx, you may consider adding something like dtolnay/cxx#1120 to your fork. Though, if this existing code works, and it's intended to be temporary before the next release, this issue is kinda moot for fish's purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relies on unstable implementation details of Cargo
To clarify, Cargo only provides build.rs scripts with the OUT_DIR environment variable as a path to write files, which is a subdirectory of target with an unreliable path that includes a hash. To create target/cxxbridge, cxx-build uses a hacky method of walking up from OUT_DIR to find the root of the target directory using some files Cargo creates for unrelated reasons, so that could break in some future version of cargo. A few weeks ago, I tried to start a discussion upstream about a better way to handle this, but it didn't go anywhere.
|
Just throwing an idea out there: Given that this is a big underlying technical change and needs some larger changes in packaging, would it make sense to:
This is, of course, a change in policy and more work, but I believe it may be warranted. These releases could be source-only (just tag a 3.6.3 and send out a mail about it), for those who are unable to install rust, especially distro packagers who find themselves looking at an incompatible rustc version. It's of course possible I'm overestimating the difficulty here, maybe it's not needed because nobody has a problem switching to 4.0 immediately. |
|
Building on that old NetBSD VM I have lying around currently fails with Removing that static_assert (which is fine) makes it go on until it eventually ends in (and a bunch more nix::sys::timer and nix::sys::aio functions) It also absolutely requires libclang, and will fail rather late in the build process. |
If you want to use the distro version of Rust then you'll need to use a much older rust than latest stable. You can look it up for a particular distro, but Debian's rustc (for example) is often 6 months or more behind the latest stable. If you want to ship your program in a distro you usually need to build the program using that distro's version of the compiler, so I suspect you'd need to push your "minimum supported rust verison" (MSRV) farther into the past if your project is supposed to be part of distros. If you don't need to be shipped in a distro, it's as easy as |
AFAIK Debian updates both Third-party repos, however, will need either their own newer |
|
Just taking a quick look at the versions in https://packages.debian.org/sid/rustc:
I'm not sure what you mean by "for the duration of the development cycle", but even sid falls behind "latest stable" on a regular basis until someone goes and updates it. |
You do not. See: BurntSushi/ripgrep#1019 Essentially, distros like Debian will just ship an older version of your program, just like they do for almost everything else. |
|
I've been alternating between obsessed with and simply always entertaining in the back of my mind the idea of a rust-powered fish for many years now (and have discussed that with other team members), so I'm approaching this with an open mind, but also with concerns. I must say that I knew this was coming for many months now from following @ridiculousfish on GitHub (❤️), and have been wondering how it would the topic would be broached. I passionately believe that rust is the way forward for writing any new code, especially in a cross-platform context. Quite apart from all the memory and concurrency safety it brings to the table, the general approach of the language and the community to always put correctness first and foremost has built a strong culture of correct, well-designed, and fairly maintainable software engineering. The package management is great, the team is fantastic, and support in the broader software world has been extremely positive and welcoming. To go further, I have written my own fish-compatible rewrite (not port) of fish's tokenizer and was making progress on the AST and parser when I last had my "burning desire" to see something like fish but in rust. I was maintaining it on my private git server and not GitHub, but I've pushed it to GH in case anyone cares to take a look: https://github.com/mqudsi/velvet Having said all that, I worry that rushing any part of this could be a death knell for the project. A quality rewrite (or port) takes time, on the magnitude of years rather than days. One good example to take inspiration from is tectonic, a xelatex port that started off with machine-rewritten code and then manually made progress on actual, idiomatic rust conversions... but even that was a greenfield project and not a first-party effort and so not saddled with some of the concerns we would be saddled by. The recommended module-by-module approach is definitely a great suggestion for a starting place, and I think it has great potential. Off the top of my head, some concerns or points:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Note that in the fish-shell/doc_internal/fish-riir-plan.md Line 40 in 88ebaeb
As a general comment to anyone reading this, it's worth reading the entire fish-riir-plan.md document before commenting. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
Just leaving a comment here, based on your planning doc:
This crate might be what you want to use instead. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
But don't do it for enum variants just yet.
This will make the Rust port's diff smaller.
In hindsight, I should probably have split this into three different commits.
This is duplicated for now, since a `&mut [&wstr]` can't be passed over FFI.
|
Any update on when/how |
|
I'll tag the "Last C++11" and merge in maybe 12-16 hours, assuming there are no objections from fish shell committers. (It will be a real merge, not a rebase). |
|
Merged as 27f5490 Here we go! 🤞 |
|
I've deleted the riir branch; please redirect any outstanding PRs to master |
Doesn't look like that's possible once they've been force-closed, so they'll have to be recreated. |
|
If it's not too much trouble, some people over at r/rust are curious how the rewrite is going? ❤️ Is it at a point where people unfamiliar with the codebase might be able to help with drive-by contributions? |
|
@PoignardAzur I've written something up about the state of the port in #10123. I would ask you to leave any replies there, this PR is quite big and replying here might notify a lot of people. In short: We're "mostly" done and I don't believe the rest is well-suited for drive-by contributions. |
(Editor's note - please read #9512 (comment) and #9512 (comment) before commenting if you are new to fish or not familiar with the context - @zanchey)
(Progress report November 2023)
(Sorry for the meme; also this is obligatory.)
I think we should transition to Rust and aim to have it done by the next major release:
This should be thought of as a "port" instead of a "rewrite" because we would not start from scratch; instead we would translate C++ to Rust, incrementally, module by module, in the span of one release. We'll use an FFI so the Rust and C++ bits can talk to each other until C++ is gone, and tests and CI keep passing at every commit.
To prove it can work, in this PR I've ported FLOG, topic monitor, wgetopt, builtin_wait, and some others to Rust. The Rust bits live in a crate that lives inside the C++ and links with it. You can just build it the usual way:
It works in GH Actions CI too.
The Plan has a high level description, and the Development Guide has more technical details on how to proceed. Please consider both to be part of this PR.