Personal tools
You are here: Home Bugs All Hoary 2.6.10 oom-killer

Hoary 2.6.10 oom-killer

by Herbert Straub last modified 2008-03-23 11:30
— filed under:

Long server uptime and the Hoary oom-killer bug.

Actual situation

See Ubuntu BugNr 13144

Description

The oom-killer terminates processes, in spite of the fact that there is a lot of free swap space available. I can see this error in two situation: a) on a 4 CPU and 4 GB RAM with 8GB Swap with 1TB NFS mounted space for iozone, bonnie++ tests and b) on a 64MB RAM System with one week uptime (not much, but referenced in the RedHat bugzilla - see below). The error in situation a) can be reproduced every time and occours, if i start a I/O intensive application (like iozone) on the 1TB NFS mounted filesystem (the message.log are below). The machine in situation b) is for internet access and with the time, i missing important processes. I found the oom-killer messages in the system logfile files.

Details for situation a:

    $ free
                 total       used       free     shared    buffers     cached
    Mem:       3637100     109448    3527652          0      47588      13952
    -/+ buffers/cache:      47908    3589192
    Swap:      7815612       2744    7812868

    $ df
    ... 
    server:/volume
                          1.1T  6.2G  1.1T   1% /mnt

    Software: pure Hoary with kernel: 2.6.10-5-686-smp

Testcase

If i'm starting iozone on the /mnt filesystem, then i can observe with top, that the Mem: free space get lower and lower and the Swap: cached get higher and higer. After few seconds the systems "hangs". If the sshd process will be not terminated, then the systems recovers after a few seconds. In the message logfile, i found the following:

        messages:

        Jul 27 14:46:54 localhost kernel: oom-killer: gfp_mask=0xd0
        Jul 27 14:46:54 localhost kernel: DMA per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: Normal per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: HighMem per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel:
        Jul 27 14:46:54 localhost kernel: Free pages:        4668kB (896kB HighMem)
        Jul 27 14:46:54 localhost kernel: Active:2955 inactive:863904 dirty:0 writeback:367192 unstable:0 free:1167 slab:37
        359 mapped:2718 pagetables:116
        Jul 27 14:46:54 localhost kernel: DMA free:68kB min:68kB low:84kB high:100kB active:0kB inactive:12652kB present:16
        384kB pages_scanned:0 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: Normal free:3704kB min:3756kB low:4692kB high:5632kB active:68kB inactive:710704k
        B present:901120kB pages_scanned:757 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: HighMem free:896kB min:512kB low:640kB high:768kB active:11752kB inactive:2732316
        kB present:2752460kB pages_scanned:0 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0
        *4096kB = 68kB
        Jul 27 14:46:54 localhost kernel: Normal: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048k
        B 0*4096kB = 3704kB
        Jul 27 14:46:54 localhost kernel: HighMem: 96*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*204
        8kB 0*4096kB = 896kB
        Jul 27 14:46:54 localhost kernel: Swap cache: add 686, delete 1, find 0/0, race 0+0
        Jul 27 14:46:54 localhost kernel: Swap cache: add 686, delete 1, find 0/0, race 0+0
        Jul 27 14:46:55 localhost kernel: oom-killer: gfp_mask=0xd0

References in the net

I found the same problem description in the RedHat Bugzilla

Workaround

With kernel 2.6.11-1-686-smp, i can't reproduce the error.

Details for situation b:

    $ free
                 total       used       free     shared    buffers     cached
    Mem:         61124      59504       1620          0       9252      25920
    -/+ buffers/cache:      24332      36792
    Swap:      1943736          0    1943736

    messages:

    Aug  1 22:42:05 localhost kernel: oom-killer: gfp_mask=0xd2
    Aug  1 22:42:05 localhost kernel: DMA per-cpu:
    Aug  1 22:42:05 localhost kernel: cpu 0 hot: low 2, high 6, batch 1
    Aug  1 22:42:05 localhost kernel: cpu 0 cold: low 0, high 2, batch 1
    Aug  1 22:42:05 localhost kernel: Normal per-cpu:
    Aug  1 22:42:05 localhost kernel: cpu 0 hot: low 4, high 12, batch 2
    Aug  1 22:42:05 localhost kernel: cpu 0 cold: low 0, high 4, batch 2
    Aug  1 22:42:05 localhost kernel: HighMem per-cpu: empty
    Aug  1 22:42:05 localhost kernel:
    Aug  1 22:42:05 localhost kernel: Free pages:        1604kB (0kB HighMem)
    Aug  1 22:42:05 localhost kernel: Active:749 inactive:81 dirty:0 writeback:3 unstable:0 free:401 slab:12926 mapped:
    807 pagetables:109
    Aug  1 22:42:05 localhost kernel: DMA free:316kB min:256kB low:320kB high:384kB active:0kB inactive:16kB present:16
    384kB pages_scanned:18 all_unreclaimable? yes
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: Normal free:1288kB min:752kB low:940kB high:1128kB active:2996kB inactive:308kB p
    resent:48128kB pages_scanned:899 all_unreclaimable? no
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0
    kB pages_scanned:0 all_unreclaimable? no
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: DMA: 13*4kB 1*8kB 2*16kB 3*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
    0*4096kB = 316kB
    Aug  1 22:42:05 localhost kernel: Normal: 124*4kB 5*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*204
    8kB 0*4096kB = 1288kB
    Aug  1 22:42:05 localhost kernel: HighMem: empty
    Aug  1 22:42:05 localhost kernel: Swap cache: add 109074, delete 108905, find 21539/46436, race 0+8

Question:

Will this error situation fixed in hoary with the 2.6.10 kernel? I tried to apply the RedHat patches, but i cannot get it to be working. With the 2.6.11 kernel it seems to be working.

Document Actions