I recently received an DM from a kernel developer working on upstreaming an ADC driver. They hit a classic and frustrating wall: The driver worked perfectly on v6.1, but after porting it to the latest mainline (v6.18-rc5), the boot logs stopped dead at "Starting Kernel...". No panic message, no earlycon, just silence.
This is a scenario many of us faced when we first started kernel development. Here is the advice I shared, which might be helpful for anyone dealing with major kernel upgrades.
1. Don't try to jump 17 floors at once (Incremental Updates)
Moving directly from v6.1 to v6.18 is like trying to jump from the 1st floor to the 18th without an elevator. It’s a recipe for broken legs (and broken builds).
The Analogy: Treat kernel upgrades like crossing a river with stepping stones.
The Fix: Instead of the latest RC, try porting to closer Long Term Support (LTS) versions first (e.g., v6.1 -> v6.6 -> v6.10).
The Value: Making your driver work on these intermediate stable kernels is not just "busy work"—it is a valuable contribution and the best way to understand what changed and when.
2. "git bisect" is your best friend (but narrow the search first)
When a regression happens, "git bisect" is the standard tool to find the culprit commit. However, the gap between v6.1 and v6.18 contains tens of thousands of commits. Finding a needle in that haystack is painful. By following step (Incremental Updates), you can narrow the range (e.g., "It works on v6.10 but breaks on v6.11") before running bisect. This makes the process much faster and more manageable.
git bisect document: https://git-scm.com/docs/git-bisect
"git bisect" results in the kernel mailing lists: https://lore.kernel.org/all/?q=q%3A"git+bisect"
3. Leave "Breadcrumbs" in the dark (Manual Tracing)
If the system hangs even before initializes, standard debugging tools often won't help. The kernel is crashing before it has a voice.
The Analogy: Like Hansel and Gretel, you need to leave breadcrumbs to see how far you got.
The Fix: Go "old school." Manually insert print statements (pr_info() or printk()) directly into the early initialization code, such as start_kernel() in init/main.c: “I am here 1”, “I am here 2”...
It looks primitive, but seeing where the printing stops will tell you exactly which function caused the panic.
By configuring the earlycon parameter, the kernel can output messages during the initial boot phase, before standard consoles are initialized. This allows us to capture the kernel call trace.
See "Detailed Explanation of setup_earlycon" by David Zhu
https://www.linkedin.com/pulse/detailed-explanation-setupearlycon-david-zhu-lai0c/
The Linux kernel is massive, but if you break the problem down into smaller, manageable steps, no bug is unfixable. Happy hacking!