Linux: Out-of-Memory (OOM) Killer (Doc ID 452000.1)

简介: Linux: Out-of-Memory (OOM) Killer (Doc ID 452000.1)

What is OOM Killer?
The OOM killer, a feature enabled by default, is a self-protection mechanism employed by the Linux kernel when under severe memory pressure.

If the kernel can not find memory to allocate when it's needed, it puts in-use user data pages on the swap-out queue, to be swapped out. If the Virtual Memory (VM) cannot allocate memory and can't swap out in-use memory, the Out-of-memory killer may begin killing current userspace processes. it will sacrifice one or more processes in order to free up memory for the system when all else fails.

If you have a line like below in /var/log/messages

Apr 1 00:01:02 kernel: Out of Memory: Killed process 2592 (oracle).
this means that the OOM killer has killed the oracle dedicated server process 2592.

Implementation of OOM Killer Based on 2.6 Kernel?
The implementation of OOM killer based on 2.6 Kernel, exists in the file mm/oom_kill.c, The out_of_memory() function, which is the entrance of OOM killer. here is the calling sequence:

mm/page_alloc.c::_alloc_pages() --> mm/vmscan.c::try_to_free_pages() --> mm/oom_kill.c::out_of _memory() --> select_bad_process() --> badness().
Originally, out_of_memory() is invoked by __alloc_pages() function when the free memory is very low. The __alloc_pages() function is the heart of the zoned buddy system allocator. All high level functions which request free page frames such as alloc_pages() and __get_free_pages() invoke __alloc_pages() ultimately. In __alloc_pages() function, it tries multiply methods and goes through the zone list to allocate requested page frames. If it still cannot get the page frames, it will call out_of_memory() function to invoke OOM killer which will kill any process to get free memory or go to panic.

The behavior of OOM killer in principle is as follows:

Lose the minimum amount of work done
Recover as much as memory it can
Do not kill anything actually not using a lot memory alone
Kill the minimum amount of processes (one)
Try to kill the process the user expects to kill
What Causes OOM Killer Event?
The Kernel is Really out of Memory

The workload uses more memory than the system has RAM and swap space. If both "SwapFree" and "MemFree" in /proc/meminfo are very low (less than 1% of their total), the workload might be unhealthy.

The Kernel is out of LowMem (32-bit architectures only)

"LowFree" in /proc/meminfo is very low, but HighFree is much higher. OOM killer will take action under that situation. The specific workload may benefit from being run on a 64-bit architectures or other methods (See Document 452326.1) .

The Kernel Data Structures Take up Too Much Space or There is Non-Freed Memory

There are kernel data structures as named objects in /proc/slabinfo. Please refer the Document 434351.1 to get more info about SLAB allocator.

If one kind of object is taking up a vast portion of the system's total memory, that object may be responsible for the memory shortage. Look for potential subsystems (like ocfs2) about the specific kernel structure. To see the object usage, run this on the command-line:

awk '{printf "%s %d MB\n",$1, $3$4/(10241024)}' /proc/slabinfo |sort -nr -k2,2

You can also see Document 434351.1 for other tools you can use for analysing the SLABs and objects.

The Kernel is Not Using The Swap Space Properly

If the application uses mlock() or HugeTLB pages (HugePages), it may not be able to use its swap space for that application (because locked pages or HugePages are not swappable). If this happens, SwapFree may still have a very large value when the OOM occurs. However, overusing them may exhaust system memory and leave the system with no other recourse.

Other Potential Reasons

It is also possible for the system to find itself in a sort of deadlock. Writing data out to disk may itself, require allocating memory for various I/O data structures. If the system cannot find even that memory, the very functions used to create free memory will be hamstring and the system will likely run out of memory.

It is possible to do some minor tuning to start paging earlier, but if the system cannot write dirty pages out fast enough to free memory, one can only conclude that the workload is mis-sized for the installed memory and there is little to be done. Raising the value in /proc/sys/vm/min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before. This makes it harder to get into these kinds of deadlocks. If you get these deadlocks, this is a good value to tune.

Alternatively, the kernel might have made a bad decision and mis-read its statistics. It went OOM while it still had good RAM to use sometimes. This would be a kernel BUG that will need to be fixed.

How to Avoid OOM Situation?
Basically, it is not an OS problem when OOM error occurs. It is a kind of tactic for the kernel to keep the system running. Please add more extra memory or increase swap if physical memory/swap space is too low on the outaged box. Another solution may be to move some of the applications from the problematic system.

There are also many methods to avoid OOM Killer problem, such as:

Set up hugemem kernel.
Setting kernel parameter: vm.lower_zone_protection (this only applies to x86 32-bit environments - i.e. does not apply to x86-64 etc.) - see also Document 842886.1
If vm.lower_zone_protection is not available then vm.min_free_kbytes should be setup. Current recommended setting is (x) > (4*sqrt(/proc/meminfo:MemTotal))
For more information, please refer the Document 452326.1 section: Troubleshooting LowMem Pressure Problems

Tuning OOM Killer
With OL/RHEL5, the oom_adj value in /proc pseudo-filesystem for every process can be used for tuning the likelihood of termination of that process by the OOM killer. Higher values mean to increase the likelihood of being killed by the oom-killer. Valid values are in the range -16 to 15, plus the special value '-17', which disables oom-killing that process altogether. The default is 0:

e.g. for process 2592:

echo 10 > /proc/2592/oom_adj

leads the 2592 to be a good candidate for OOM killing and

echo -15 > /proc/2592/oom_adj

makes the OOM killing of the process less likely. To disable OOM-killer for that specific process, use special value -17:

echo -17 > /proc/2592/oom_adj

Until that value is reset the process will be protected from OOM-killer activities. This can be setup for Oracle database critical background processes.

相关文章
|
Linux API C语言
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
444 0
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
|
Linux 数据库 Perl
【YashanDB 知识库】如何避免 yasdb 进程被 Linux OOM Killer 杀掉
本文来自YashanDB官网,探讨Linux系统中OOM Killer对数据库服务器的影响及解决方法。当内存接近耗尽时,OOM Killer会杀死占用最多内存的进程,这可能导致数据库主进程被误杀。为避免此问题,可采取两种方法:一是在OS层面关闭OOM Killer,通过修改`/etc/sysctl.conf`文件并重启生效;二是豁免数据库进程,由数据库实例用户借助`sudo`权限调整`oom_score_adj`值。这些措施有助于保护数据库进程免受系统内存管理机制的影响。
|
缓存 监控 Java
在Linux中,OOM是什么引起的?排查思路是什么?
在Linux中,OOM是什么引起的?排查思路是什么?
|
Linux Shell C语言
如何在 Linux 中查找父进程 ID (PPID)?
【5月更文挑战第4天】
2940 4
如何在 Linux 中查找父进程 ID (PPID)?
|
Linux
Linux系统之id命令的基本使用
Linux系统之id命令的基本使用
707 5
Linux系统之id命令的基本使用
|
Linux 数据处理 数据库
深入解析Linux命令id:理解用户身份与权限
`id`命令在Linux中用于显示用户身份(UID, GID和附加组)。它查看系统用户数据库获取信息。参数如`-u`显示UID,`-g`显示GID,`-G`显示附加组,结合`-n`显示名称而非ID。用于确认命令执行者身份,确保权限正确。在脚本中使用时注意权限管理,遵循最小权限原则。
|
安全 Linux 数据安全/隐私保护
探索Linux命令newuidmap:用户ID映射的利器
`newuidmap`是Linux工具,用于在用户命名空间中设定UID映射,支持容器安全。它允许限定容器内进程的主机系统权限,确保数据安全和隔离。通过映射文件或命令行参数定义UID映射,提供灵活性和安全性。例如,为Docker容器设置映射,使进程能访问特定UID的数据文件。使用时需注意映射准确性、权限控制和避免映射过多UID。与其他工具如`newgidmap`配合使用以增强用户命名空间支持。
|
Linux API
Linux内核中的两种ID分配方式
Linux内核中的两种ID分配方式
|
Linux 调度
【Linux】线程ID
【Linux】线程ID
312 1
|
Oracle 关系型数据库 Linux
Oracle Linux: How To Disable NUMA At OS Level (Doc ID 2193586.1)
Oracle Linux: How To Disable NUMA At OS Level (Doc ID 2193586.1)
541 1