Common Problems - Linux kernel doesnt boot

From RidgeRun Developer Connection
Jump to: navigation, search

What can you do if the Linux kernel doesn't boot to the login prompt? This page will give you some ideas.

Starting kernel ...

There can be several causes to the problem that causes Starting kernel ... to be the last output you see.

Example output:

## Booting kernel from Legacy Image at 82000000 ...
   Image Name:   "RR Linux Kernel"
   Created:      2011-05-26  13:20:46 UTC
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4401664 Bytes = 4.2 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

and then no more console output.

The last data sent to the console by u-boot is the string Starting kernel .... The next output is generated by Linux. If you are not seeing any output after Starting kernel ..., then something has gone wrong with either Linux sending the output to the right UART, a mismatch in u-boot and kernel images, or the kernel image is corrupt.

Wrong kernel console configuration

It is possible that Linux is booting just fine, but sending the boot output to a different UART. Reboot the hardware and interrupt auto-boot so that u-boot is active. Run the following u-boot command.

echo ${bootargs}

If you are using a DM365 or DM368, you should have

console=ttyS0,115200n8

in the bootargs string.

Mismatch between u-boot machine ID and Linux machine ID

When the bootloader passes control to the kernel, u-boot creates a data structure that includes a machine ID value. When Linux starts, it reads the data structure and if the machine ID doesn't match the value built into the kernel, then Linux spins in a loop and doesn't generate any output. The logic behind the kernel feature is to make sure the wrong kernel image doesn't run and damage the hardware or the file system contents.

uboot gives you a way of setting the machine ID to use:

setenv machid d79 # 3449

For ARM processors, the very first Linux kernel to run is found in $DEVDIR/kernel/linux-*/arch/arm/kernel/head.S:

ENTRY(stext)
        setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9 @ ensure svc mode
                                                @ and irqs disabled
        mrc     p15, 0, r9, c0, c0              @ get processor id
        bl      __lookup_processor_type         @ r5=procinfo r9=cpuid
        movs    r10, r5                         @ invalid processor (r5=0)?
        beq     __error_p                       @ yes, error 'p'
        bl      __lookup_machine_type           @ r5=machinfo
        movs    r8, r5                          @ invalid machine (r5=0)?
        beq     __error_a                       @ yes, error 'a'

You can see if the machine ID is invalid, the code jumps to __error_a, which is defined in $DEVDIR/kernel/linux-*/arch/arm/kernel/head-common.S, and in simplified form looks like:

__error_a:
#ifdef CONFIG_DEBUG_LL
        mov     r4, r1                          @ preserve machine ID
        adr     r0, str_a1
        bl      printascii

...
        b       __error

str_a1: .asciz  "\nError: unrecognized/unsupported machine ID (r1 = 0x"
#endif

__error:
1:      mov     r0, r0
        b       1b

You can see an error message is printed if you build the kernel with CONFIG_DEBUG_LL defined (enabled using make config Kernel configuration ---> Kernel hacking ---> Kernel low-level debugging functions).

Using the factory NAND contents booting a Linux kernel with LeopardBoard 365 support products (with CONFIG_DEBUG_LL defined):

## Booting kernel from Legacy Image at 82000000 ...
   Image Name:   "RR Linux Kernel"
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4410144 Bytes =  4.2 MB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...


Error: unrecognized/unsupported machine ID (r1 = 0x00000793).

Available machine support:

ID (hex)        NAME
00000a59        DM365 Leopard

Please check your kernel config and/or bootloader.

If there is a machine mismatch, the processor will loop forever at __error. If CONFIG_DEBUG_LL is not defined, then there is no output after Starting kernel ....

If you want to override the machine check to enable booting a kernel with a machine ID that doesn't match the one set by u-boot, then add the #if 0 / #endif to $DEVDIR/kernel/linux-*/arch/arm/kernel/head.S as shown below:

ENTRY(stext)
        setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9 @ ensure svc mode
                                                @ and irqs disabled
        mrc     p15, 0, r9, c0, c0              @ get processor id
        bl      __lookup_processor_type         @ r5=procinfo r9=cpuid
        movs    r10, r5                         @ invalid processor (r5=0)?
        beq     __error_p                       @ yes, error 'p'
        bl      __lookup_machine_type           @ r5=machinfo
        movs    r8, r5                          @ invalid machine (r5=0)?
#if 0
        beq     __error_a                       @ yes, error 'a'
#endif

I have found this approach to work sometimes, but not all the time. If there is too much difference in how u-boot configures the processor and the other hardware and what is expected by Linux, you still will not be able to boot.

Corrupt kernel image

If both the console setting and the machine ID are configured properly, then the most common problem is the kernel image is corrupted.

First use a known good set of images to verify there is no hardware problem. Compare the u-boot bootargs used in the known good images to the bootargs you are attempting to use. An invalid mem= setting, for example, could cause Linux to not boot.

After you have verified your hardware is working correctly, you can rebuild the kernel following these steps:

cd $DEVDIR/kernel
make clean
cd $DEVDIR
make
make install

Hopefully rebuilding the kernel will resolve the problem.

Links related