Linux 2.6.35 hanging/oopsing on large memory allocations
Short summary
Kernel 2.6.35.4 kernel on x86_64 doesn’t seem to handle large memory allocations well. In particular,
- Running malloc() with Gigabyte chunks can cause a kernel oops
- Quickly allocating all memory will make the system hang (but not oops)
If anyone has an idea why this is, some helpful clues or whatever, please comment below.
I should also mention that I applied the patch, which fixes the general system freeze on rapid disk I/O, mentioned in another post of mine. I didn’t feel like going on crashing my system over and over again, so I skipped the test without this patch.
Allocating a huge chunk
Running a vanilla 2.6.35.4 kernel on a x86_64 machine (as 64-bit Linux), I wanted to see how well my 16 GB of RAM worked. So I decided to allocate a lot of memory and see what happens. More precisely, I wrote the following somewhat dirty program, and ran it (calling it memeater):
#include <stdlib.h> #include <unistd.h> #include <stdio.h> int main() { long int size = 1024*1024*1024; long int i; size *= 4; char *p = malloc(size); printf("Size is %ld, Pointer is %08lx\n", size, (unsigned long int) p); if (p) for (i=0; i<size/1024; i++) { *p = 0; p += 1024; } getc(stdin); return 0; }
A short explanation: This program merely requires 4 Gigabyte of memory in a single malloc(). The loop writes something on each memory page (a 4096 jump rather that 1024 would be OK as well, I believe). This is necessary, since memory allocation doesn’t really consume memory until used.
It then expects the user to press RETURN on console, so that the memory is held until deliberately released.
In theory, this shouldn’t be a problem. Allocating 4 Gigs of memory should either return a pointer or a NULL. Definitely not oops as follows:
May 17 18:11:14 kernel: general protection fault: 0000 [#3] SMP May 17 18:11:14 kernel: last sysfs file: /sys/devices/virtual/sound/timer/uevent May 17 18:11:14 kernel: CPU 4 May 17 18:11:14 kernel: Modules linked in: nfsd exportfs it87 hwmon_vid vmnet vmblock vmci vmmon cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_multipath kvm_intel kvm uinput snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_ intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd iTCO_wdt ppdev 8139too soundcore 8139cp tulip parport_pc r8169 iTCO_vendor_support pcspkr snd_page_alloc parport i2c_i801 mii sha256_generic cryptd aes_x86_64 aes _generic dm_crypt raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx ata_generic pata_acpi pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode] May 17 18:11:14 kernel: May 17 18:11:14 kernel: Pid: 3056, comm: memeater Tainted: G D 2.6.35.4-OCHO3 #1 P55-UD3R/P55-UD3R May 17 18:11:14 kernel: RIP: 0010:[<ffffffff811024ae>] [<ffffffff811024ae>] mem_cgroup_charge_statistics+0x9/0x50 May 17 18:11:14 kernel: RSP: 0018:ffff88040a85fa48 EFLAGS: 00010246 May 17 18:11:14 kernel: RAX: 00000000ffffff01 RBX: ffffea000a918e00 RCX: 0000000000000060 May 17 18:11:14 kernel: RDX: 0000000000000000 RSI: ffff8804142c8a00 RDI: ffbfc90001817000 May 17 18:11:14 kernel: RBP: ffff88040a85fa48 R08: ffff880409eba958 R09: 00000000ffffffc0 May 17 18:11:14 kernel: R10: 0000000000400000 R11: ffffea000a9d0df0 R12: 0000000000000001 May 17 18:11:14 kernel: R13: ffff8804142c8a00 R14: ffbfc90001817000 R15: ffff880409fadcc0 May 17 18:11:14 kernel: FS: 00007f36017db700(0000) GS:ffff880002100000(0000) knlGS:0000000000000000 May 17 18:11:14 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 17 18:11:14 kernel: CR2: 0000000001861000 CR3: 0000000001a42000 CR4: 00000000000006e0 May 17 18:11:14 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 17 18:11:14 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 17 18:11:14 kernel: Process memeater (pid: 3056, threadinfo ffff88040a85e000, task ffff880409fadcc0) May 17 18:11:14 kernel: Stack: May 17 18:11:14 kernel: ffff88040a85fa98 ffffffff8110579c ffff88040a85faa8 ffffffff810ec2d1 May 17 18:11:14 kernel: <0> ffff88040a85fa88 ffffea000a918e00 00007f35e72e1000 8000000305040067 May 17 18:11:14 kernel: <0> 0000000000099000 ffff880409eba958 ffff88040a85faa8 ffffffff81105826 May 17 18:11:14 kernel: Call Trace: May 17 18:11:14 kernel: [<ffffffff8110579c>] __mem_cgroup_uncharge_common+0x194/0x1e5 May 17 18:11:14 kernel: [<ffffffff810ec2d1>] ? free_pages_and_swap_cache+0x63/0x80 May 17 18:11:14 kernel: [<ffffffff81105826>] mem_cgroup_uncharge_page+0x27/0x29 May 17 18:11:14 kernel: [<ffffffff810e7267>] page_remove_rmap+0x28/0x50 May 17 18:11:14 kernel: [<ffffffff810dd7b3>] unmap_vmas+0x5c5/0x928 May 17 18:11:14 kernel: [<ffffffff810e2ec1>] exit_mmap+0xce/0x132 May 17 18:11:14 kernel: [<ffffffff8104ad9b>] mmput+0x5e/0xca May 17 18:11:14 kernel: [<ffffffff8104f1cf>] exit_mm+0x114/0x121 May 17 18:11:14 kernel: [<ffffffff81050b7b>] do_exit+0x226/0x726 May 17 18:11:14 kernel: [<ffffffff8105a08e>] ? try_to_del_timer_sync+0x7b/0x89 May 17 18:11:14 kernel: [<ffffffff810510f8>] do_group_exit+0x7d/0xa5 May 17 18:11:14 kernel: [<ffffffff8105ee27>] get_signal_to_deliver+0x373/0x395 May 17 18:11:14 kernel: [<ffffffff812c4dce>] ? n_tty_read+0x6b3/0x786 May 17 18:11:14 kernel: [<ffffffff81009010>] do_signal+0x72/0x68d May 17 18:11:14 kernel: [<ffffffff812c758d>] ? tty_ldisc_deref+0xe/0x10 May 17 18:11:14 kernel: [<ffffffff812c025f>] ? tty_read+0x8c/0xc5 May 17 18:11:14 kernel: [<ffffffff81009657>] do_notify_resume+0x2c/0x6e May 17 18:11:14 kernel: [<ffffffff81009f00>] int_signal+0x12/0x17 May 17 18:11:14 kernel: Code: ff 4c 89 e3 4d 8b 24 24 4c 39 eb 75 de 48 c7 c7 60 29 a6 81 e8 b5 eb 39 00 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 0f 1f 44 00 00 <48> 8b 87 10 11 00 00 80 fa 01 19 c9 83 c9 01 f6 06 02 48 63 c9 May 17 18:11:14 kernel: RIP [<ffffffff811024ae>] mem_cgroup_charge_statistics+0x9/0x50 May 17 18:11:14 kernel: RSP <ffff88040a85fa48> May 17 18:11:14 kernel: ---[ end trace 4d26f08f6051ed51 ]--- May 17 18:11:14 kernel: Fixing recursive fault but reboot is needed
And I should mention that when I tried this for 16 GB, the system just hung. But that’s explained next.
Allocating all RAM
So I said, OK, that must be because nobody is really expected to allocate those huge chunks in one go. So what happens when the memory just ends? I have some swap space, so this should work…
#include <stdlib.h> #include <unistd.h> #include <stdio.h> int main() { long int size = 1024*1024; long int i,k; size *= 16; for (k=0; k<1024; k++) { char *p = malloc(size); printf("Size is %ld, Pointer is %08lx\n", size, (unsigned long int) p); if (p) for (i=0; i<size/1024; i++) { *p = 0; p += 1024; } } getc(stdin); return 0; }
The idea here is simple: Loop 1024 times, allocating 16MB at a time. This is sane and eventually takes 16 GB.
But no, the system just hung. No oops, no drama. Just nothing happened. Processes were stalled, typing and switching consoles with Shift-Alt-Fx worked. Ctrl-Alt-Delete is ignored. No reboot. Only a reset got me out of this.
For the record, running the program with chunks of 14 MB each, so almost all memory was allocated (14 GB out of 16 GB), worked cleanly, and the system remained stable.
Of course I have swap
And this is the proof: This is my /proc/meminfo with a system barely doing anything:
MemTotal: 16463436 kB MemFree: 14909616 kB Buffers: 75428 kB Cached: 664284 kB SwapCached: 0 kB Active: 459044 kB Inactive: 604272 kB Active(anon): 324088 kB Inactive(anon): 115208 kB Active(file): 134956 kB Inactive(file): 489064 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree: 4194300 kB Dirty: 1348 kB Writeback: 0 kB AnonPages: 323592 kB Mapped: 116232 kB Shmem: 115704 kB Slab: 95920 kB SReclaimable: 46340 kB SUnreclaim: 49580 kB KernelStack: 4008 kB PageTables: 36940 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 12426016 kB Committed_AS: 1576024 kB VmallocTotal: 34359738367 kB VmallocUsed: 123416 kB VmallocChunk: 34359587772 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 8064 kB DirectMap2M: 16764928 kB
Sometimes I’ve sees some of the swap actually used. It’s not like it doesn’t work.
Reader Comments
You can allocate huge consistent mem when machine boots.