Modal Title
Linux

Demystifying Linux Kernel Initialization

The kernel initialization sequence is complex. Let’s examine the boot process.
Feb 14th, 2023 9:16am by
Featued image for: Demystifying Linux Kernel Initialization

If you ever wondered what happens between the time you power up your laptop and when you get back with that steaming cup of your favorite beverage? This article will take the lid off and demystify the Linux kernel boot process.

Let’s first talk about how much visibility a user has into what’s happening on their system and the state of the kernel itself. The Linux kernel is constantly printing diagnostic messages to the kernel ring buffer. As the name indicates, this ring buffer is a circular buffer in which older messages get overwritten by newer ones. It prints important messages about the initialization process during initialization and boot up. Several priority levels, such as emergency, critical, error, warning and informational are supported and logged to the ring buffer.

If you are wondering how to view these messages, a command line tool dmesg exists just for that reason. Users can run the dmesg command line tool to examine or control the kernel ring buffer. dmesg control options allow users to clear, read first, then clear, enable and disable printing of messages to the console. We are going to focus on the examine options primarily.

The dmesg command is a versatile tool with several options to zero in on specific messages for supported log facility and priority levels.

Supported log facilities:

  • kern – kernel messages
  • user – random user-level messages
  • mail – mail system
  • daemon – system daemons
  • auth – security/authorization messages
  • syslog – messages generated internally by syslogd
  • lpr – line printer subsystem
  • news – network news subsystem

Supported log levels (priorities):

  • emerg — system is unusable
  • alert — action must be taken immediately
  • crit — critical conditions
  • err — error conditions
  • warn — warning conditions
  • notice — normal but significant condition
  • info — informational
  • debug — debug-level messages

As an example, the following commands can be used to view specific priority messages:

  • dmesg -l emerg# shows emergency messages
  • dmesg -l alert# shows alerts
  • dmesg -l crit# shows critical messages
  • dmesg -l err# shows error messages below emergency and critical levels
  • dmesg -l err+# shows all error levels emergency, critical and error

As an example, the following commands can be used to view log facility messages:

  • dmesg -f kern
  • dmesg -f daemon

These messages can be hard to parse and understand unless you look at them day in and day out. Let’s look at a couple of useful options that improve readability.

  •  -t (–notime) option shows messages without timestamps;
  • -x (–decode) option decodes the log facility and priority numbers into user-friendly human-readable information.

Combining these two options yields easily readable and understandable messages. In addition, this format without timestamps lends itself well to comparing and analyzing messages from different kernel boots for regressions in errors.

dmesg -t -x

Here is an excerpt from dmesg that includes info, debug, warn-level messages:


With this overview of the kernel ring buffer and how to view kernel messages, we can dive into the kernel initialization sequence.

The kernel initialization sequence is complex and carried out in several stages using initcalls. The initcalls contribute a significant amount of time to the Linux boot process. initcalls are grouped by functionality, and early initcalls [early_initcall(fn)] run before initializing SMP (symmetric multiprocessing) and are for built-in (statically linked) code, not for dynamically loadable modules. As of this writing, there are more than 110 early_initcall() defined in the kernel. A pure initcall is much more restricted, has no dependencies on anything else and purely initializes variables that couldn’t be statically initialized. Like early_initcall(), pure_initcall() also exists for built-in code, not for modules. There are just about 10 pure_initcall() definitions in the kernel as one would expect based on the restrictive uses of this type of initcall. One such use of a pure initcall can be found in the pci subsystem to initialize parameters when early parameters are parsed in pci_realloc_setup_params().

There are eight initcall call levels as defined in include/linux/init.h:


And kept in sync with initcall_level_names[] in init/main.c:


Enabling KNL (kernel start-up parameter) initcall_debug allows us to trace initcalls as they are executed. This is useful for understanding the kernel boot sequence, debugging early kernel panics and measuring the amount of time each initcall spends initializing. This is a good way to track regressions in kernel boot times from release to release. If one or more initcalls start taking longer to run, it could be an indication that something might be wrong.

Another useful KNL initcall_blacklist parameter can be used to specify a list of initcall functions to skip. This is useful for debugging built-in modules and initcalls. Please refer to The kernel’s command-line parameters.

Let’s now look at how to enable initcall_debug and parse dmesg for initcalls.

The first step is to make sure CONFIG_PRINTK_TIME and CONFIG_KALLSYMS kernel configuration options are enabled in your kernel. If not, you will have to build, install and boot a new kernel with these options enabled. Please refer to “A Beginner’s Guide to Linux Kernel Development” (LFD103) or other resources on how to build and boot a custom kernel. Once you have the right options enables, you can edit /etc/default/grub to append printk.time=1 initcall_debug to the GRUB_CMDLINE_LINUX variable, as shown below:

GRUB_CMDLINE_LINUX="earlyprintk=vga printk.time=1 initcall_debug"

Now let’s reboot and run dmesg -t -x command once the system is ready. I usually save the dmesg to a file to parse and analyze.


The following (original from https://elinux.org/Initcall_Debug) short script shows you the initcalls that are taking longer to complete:

cat dmesg.out | grep "initcall" | sed "s/\(.*\)after\(.*\)/\2 \1/g" | sort -n

Example from my system:


Enabling initcall_debug increases the number of messages. Please ensure the CONFIG_LOG_BUF_SHIFT kernel config option is 18 to set the log buffer size to 256K. The kernel boot process can be sped up by enabling driver_async_probe KNL to run the driver probe in asynchronous mode. Using this option, a list of driver names can be specified to be probed asynchronously. Please note that not all drivers support asynchronous probing. If you would like to learn more about initcalls, please refer to “An introduction to Linux kernel initcalls.”

bootgraph.pl script turns a dmesg output into a SVG graphic that shows which functions take how much time. You can view SVG graphics with various programs, including Inkscape, The Gimp and Firefox. I will leave this as a resource for you to explore as you learn more about initcalls.

Here is a shell script I use to keep track of initcall times and errors.


As you can see, once you have the initcall information, the data can be used to understand the boot sequence, debug problems and keep track of kernel boot times from one kernel release to another.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.