Demystifying Linux Kernel Initialization

If you ever wondered what happens between the time you power up your laptop and when you get back with that steaming cup of your favorite beverage? This article will take the lid off and demystify the Linux kernel boot process.
Let’s first talk about how much visibility a user has into what’s happening on their system and the state of the kernel itself. The Linux kernel is constantly printing diagnostic messages to the kernel ring buffer. As the name indicates, this ring buffer is a circular buffer in which older messages get overwritten by newer ones. It prints important messages about the initialization process during initialization and boot up. Several priority levels, such as emergency, critical, error, warning and informational are supported and logged to the ring buffer.
If you are wondering how to view these messages, a command line tool dmesg exists just for that reason. Users can run the dmesg
command line tool to examine or control the kernel ring buffer. dmesg
control options allow users to clear, read first, then clear, enable and disable printing of messages to the console. We are going to focus on the examine options primarily.
The dmesg
command is a versatile tool with several options to zero in on specific messages for supported log facility and priority levels.
Supported log facilities:
- kern – kernel messages
- user – random user-level messages
- mail – mail system
- daemon – system daemons
- auth – security/authorization messages
- syslog – messages generated internally by syslogd
- lpr – line printer subsystem
- news – network news subsystem
Supported log levels (priorities):
- emerg — system is unusable
- alert — action must be taken immediately
- crit — critical conditions
- err — error conditions
- warn — warning conditions
- notice — normal but significant condition
- info — informational
- debug — debug-level messages
As an example, the following commands can be used to view specific priority messages:
dmesg -l emerg#
shows emergency messagesdmesg -l alert#
shows alertsdmesg -l crit#
shows critical messagesdmesg -l err#
shows error messages below emergency and critical levelsdmesg -l err+#
shows all error levels emergency, critical and error
As an example, the following commands can be used to view log facility messages:
dmesg -f kern
dmesg -f daemon
These messages can be hard to parse and understand unless you look at them day in and day out. Let’s look at a couple of useful options that improve readability.
- -t (–notime) option shows messages without timestamps;
- -x (–decode) option decodes the log facility and priority numbers into user-friendly human-readable information.
Combining these two options yields easily readable and understandable messages. In addition, this format without timestamps lends itself well to comparing and analyzing messages from different kernel boots for regressions in errors.
dmesg -t -x
Here is an excerpt from
dmesg
that includes info, debug, warn-level messages:
1 2 3 4 5 6 7 |
kern :warn : [drm] Fence fallback timer expired on ring gfx kern :info : fbcon: amdgpudrmfb (fb0) is primary device kern :info : [drm] DSC precompute is not needed. kern :debug : calling nf_defrag_init+0x0/0xff0 [nf_defrag_ipv4] @ 1037 kern :debug : initcall nf_defrag_init+0x0/0xff0 [nf_defrag_ipv4] returned 0 aft er 1 usecs kern :debug : calling nf_defrag_init+0x0/0xff0 [nf_defrag_ipv6] @ 1037 |
With this overview of the kernel ring buffer and how to view kernel messages, we can dive into the kernel initialization sequence.
The kernel initialization sequence is complex and carried out in several stages using initcalls. The initcalls contribute a significant amount of time to the Linux boot process. initcalls are grouped by functionality, and early initcalls [early_initcall(fn)] run before initializing SMP (symmetric multiprocessing) and are for built-in (statically linked) code, not for dynamically loadable modules. As of this writing, there are more than 110 early_initcall()
defined in the kernel. A pure initcall is much more restricted, has no dependencies on anything else and purely initializes variables that couldn’t be statically initialized. Like early_initcall()
, pure_initcall()
also exists for built-in code, not for modules. There are just about 10 pure_initcall()
definitions in the kernel as one would expect based on the restrictive uses of this type of initcall. One such use of a pure initcall can be found in the pci subsystem to initialize parameters when early parameters are parsed in pci_realloc_setup_params()
.
There are eight initcall call levels as defined in include/linux/init.h:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
#define pure_initcall(fn) __define_initcall(fn, 0) #define core_initcall(fn) __define_initcall(fn, 1) #define core_initcall_sync(fn) __define_initcall(fn, 1s) #define postcore_initcall(fn) __define_initcall(fn, 2) #define postcore_initcall_sync(fn) __define_initcall(fn, 2s) #define arch_initcall(fn) __define_initcall(fn, 3) #define arch_initcall_sync(fn) __define_initcall(fn, 3s) #define subsys_initcall(fn) __define_initcall(fn, 4) #define subsys_initcall_sync(fn) __define_initcall(fn, 4s) #define fs_initcall(fn) __define_initcall(fn, 5) #define fs_initcall_sync(fn) __define_initcall(fn, 5s) #define rootfs_initcall(fn) __define_initcall(fn, rootfs) #define device_initcall(fn) __define_initcall(fn, 6) #define device_initcall_sync(fn) __define_initcall(fn, 6s) #define late_initcall(fn) __define_initcall(fn, 7) #define late_initcall_sync(fn) __define_initcall(fn, 7s) |
And kept in sync with initcall_level_names[] in init/main.c:
1 2 3 4 5 6 7 8 9 10 |
static const char *initcall_level_names[] __initdata = { "pure", "core", "postcore", "arch", "subsys", "fs", "device", "late", }; |
Enabling KNL (kernel start-up parameter) initcall_debug allows us to trace initcalls as they are executed. This is useful for understanding the kernel boot sequence, debugging early kernel panics and measuring the amount of time each initcall spends initializing. This is a good way to track regressions in kernel boot times from release to release. If one or more initcalls start taking longer to run, it could be an indication that something might be wrong.
Another useful KNL initcall_blacklist parameter can be used to specify a list of initcall functions to skip. This is useful for debugging built-in modules and initcalls. Please refer to The kernel’s command-line parameters.
Let’s now look at how to enable initcall_debug
and parse dmesg
for initcalls.
The first step is to make sure CONFIG_PRINTK_TIME
and CONFIG_KALLSYMS
kernel configuration options are enabled in your kernel. If not, you will have to build, install and boot a new kernel with these options enabled. Please refer to “A Beginner’s Guide to Linux Kernel Development” (LFD103) or other resources on how to build and boot a custom kernel. Once you have the right options enables, you can edit /etc/default/grub
to append printk.time=1 initcall_debug
to the GRUB_CMDLINE_LINUX
variable, as shown below:
GRUB_CMDLINE_LINUX="earlyprintk=vga printk.time=1 initcall_debug"
Now let’s reboot and run dmesg -t -x
command once the system is ready. I usually save the dmesg
to a file to parse and analyze.
1 2 3 4 5 6 7 8 |
kern :debug : initcall con_init+0x0/0x378 returned 0 after 0 usecs kern :debug : initcall hvc_console_init+0x0/0x1f returned 0 after 0 usecs kern :debug : initcall xen_cons_init+0x0/0x80 returned 0 after 0 usecs kern :debug : initcall univ8250_console_init+0x0/0x39 returned 0 after 0 usecs kern :debug : initcall kgdboc_earlycon_late_init+0x0/0x31 returned 0 after 0 usecs kern :debug : initcall init_hw_perf_events+0x0/0x6d8 returned 0 after 0 usecs kern :debug : initcall do_init_real_mode+0x0/0x1a returned 0 after 0 usecs kern :debug : initcall trace_init_perf_perm_irq_work_exit+0x0/0x1c returned 0 after 0 usecs |
The following (original from https://elinux.org/Initcall_Debug) short script shows you the initcalls that are taking longer to complete:
cat dmesg.out | grep "initcall" | sed "s/\(.*\)after\(.*\)/\2 \1/g" | sort -n
Example from my system:
1 2 3 4 5 6 7 8 |
64564 usecs kern :debug : initcall serial8250_init+0x0/0x245 returned 0 87029 usecs kern :debug : initcall btusb_driver_init+0x0/0xff0 [btusb] returned 0 110960 usecs kern :debug : initcall ahci_pci_driver_init+0x0/0xff0 [ahci] returned 0 120384 usecs kern :debug : initcall init_kprobe_trace+0x0/0x21c returned 0 227677 usecs kern :debug : initcall br_init+0x0/0x12a [bridge] returned 0 281999 usecs kern :debug : initcall init_module+0x0/0xff0 [raid6_pq] returned 0 1104000 usecs kern :debug : initcall acpi_init+0x0/0x647 returned 0 2218805 usecs kern :debug : initcall inet6_init+0x0/0x413 returned 0 |
Enabling initcall_debug
increases the number of messages. Please ensure the CONFIG_LOG_BUF_SHIFT
kernel config option is 18 to set the log buffer size to 256K. The kernel boot process can be sped up by enabling driver_async_probe KNL to run the driver probe in asynchronous mode. Using this option, a list of driver names can be specified to be probed asynchronously. Please note that not all drivers support asynchronous probing. If you would like to learn more about initcalls, please refer to “An introduction to Linux kernel initcalls.”
bootgraph.pl
script turns a dmesg
output into a SVG graphic that shows which functions take how much time. You can view SVG graphics with various programs, including Inkscape, The Gimp and Firefox. I will leave this as a resource for you to explore as you learn more about initcalls.
Here is a shell script I use to keep track of initcall times and errors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
==================================================================== #!/bin/bash # # SPDX-License-Identifier: GPL-2.0 # # Copyright(c) Shuah Khan <skhan@linuxfoundation.org> # License: GPLv2 # # Generates wiki format table of initcall times list from # dmesg | grep initcall output # Boot kernel with printk.time=1 initcall_debug command line options # Usage: boot_initcall_tbl.sh <dmesg.out> # See async_run_entry_fn() and Documentation/core-api/printk-formats.rst # Print Table header echo "Init Call times" uname -a cat /proc/cmdline grep "Command line" $1 | awk -F ']' '{print $2}' echo "====================================================================" echo "Init calls that took longer to complete > 0 usecs" echo "====================================================================" echo "| **Init Call** | **Module** | **Retun Value** | **Time (uses)** |" echo "====================================================================" grep initcall $1 | awk -F ' ' '{ if (NF == 9 && $8 > 0) print "| " $4 " | --- | " $6 " | " $8 " | "}' grep initcall $1 | awk -F ' ' '{ if (NF == 10 && $9 > 0) print "| " $4 " | " $5 " | " $7 " | " $9 " | "}' echo "====================================================================" echo "Failed Init calls (return value < 0)" echo "====================================================================" echo "| **Init Call** | **Module** | **Retun Value** | **Time (uses)** |" echo "====================================================================" grep initcall $1 | awk -F ' ' '{ if (NF == 9 && $6 < 0) print "| " $4 " | --- | " $6 " | " $8 " | "}' grep initcall $1 | awk -F ' ' '{ if (NF == 10 && $7 < 0) print "| " $4 " | " $5 " | " $7 " | " $9 " | "}' echo "====================================================================" ==================================================================== |
As you can see, once you have the initcall information, the data can be used to understand the boot sequence, debug problems and keep track of kernel boot times from one kernel release to another.