Linux Kernel Lowmem Pressure Issues and Related Kernel Structures (Doc ID 452326.1)

简介: Linux Kernel Lowmem Pressure Issues and Related Kernel Structures (Doc ID 452326.1)

What is LowMem?
An ideal computer architecture should have page frames to be able to map or address any location in the address space of the architecture directly, providing a uniform and a virtually unlimited range.

Unfortunately the real architecture implementations have some constraints that limits the way the page frames can be employed. Specifically, on 80x86 architectures (where the Linux kernel was designed initially for), the Linux kernel has to deal with two major constraints:

The Direct Memory Access (DMA): The very old ISA bus systems can only address the first 16MB of RAM
The 32-bit systems with large RAM: The CPU cannot directly address all physical memory because the address space is limited
Linux kernel defines three "zones". Those zones for 80x86 (Intel 32-bit) architecture are:

DMA: 0x00000000 - 0x00999999 (0 - 16 MB)
LowMem: 0x01000000 - 0x037999999 (16 - 896 MB) - size: 880MB
HighMem: 0x038000000 -
The LowMem zone is named as NORMAL ZONE internally too. So in this document we will use both terms meaning exactly the same thing.

On 32-bit Linux the size of the LowMem zone is 880MB and it cannot be resized. You can use a hugemem kernel, or a 64-bit architecture to have a larger LowMem zone (See Note 264236.1)

Kernel Structure Types
Below are the main categories of structures in the Linux kernel.

IMPORTANT: A LowMem pressure / shortage is not likely to be caused by a specific kernel structure. The issues are almost always aggregate of more than one kernel structure involved.

Lists: Most of the data structures in the linux kernel are linked-lists and specifically almost all of them are circular double-linked lists which are flexible and easily traversible. Examples:
Process lists
Resource lists
Run queues
Tables: These are contiguous memory areas in LowMem and with their hierarchical structure, they can extend or shrink. Examples:
Page Tables (implemented via PTE, PMD, PGD etc.)
Exception Tables
Buffers / Caches: These are fixed size data structures that are allocated for a specific task to buffer any data transfer or cache data for faster access/manipulation.
Block buffers
Remote procedure call buffers
Index node cache
Directory entry cache
With respect to the structure categories above, the Lists and Tables are not known to cause any LowMem problem. Therefore our main focus in this document is about the Buffers / Caches category.

The Buddy System Algorithm
The internal kernel services and add-on modules need to allocate groups of contiguous page frames as temporary memory areas to be used to implement the kernel functions. The basic algorithm to implement those allocations are done by the Buddy System Algorithm. The buddy system algorithm handles page frames as "generic" areas and to avoid external fragmentation it executes based on allocating chunks of 2^n bytes (geometrically distributed). The free chunks can be seen from /proc/buddyinfo.

The SLAB Allocator
This is a specific organization and algorithm to allocate different type of kernel objects that are used for different purposes.

With the implementation of the SLAB allocator in the Linux kernel, there are predefined types illustrating a specific cache. For example:

rpc_buffers
journal_head
ext3_inode_cache
arp_cache
kiocb
bio
inode_cache
dentry_cache
Note that those also cover the Buffers / Caches allocated in the kernel. In principle any cache/buffer allocated in the kernel is allocated via the SLAB allocator.

The types of objects for the slab allocator can vary between different kernels and versions. Moreover loaded kernel modules (like OCFS2 modules) can introduce additional caches. e.g.:

ocfs2_inode_cache
ocfs2_lock
...
For further information about SLABs and SLAB allocator please see Note 434351.1.

Kernel Structures per Purpose
The following is an exemplary list of structures that are allocated within LowMem. The most common problem leading ones are marked bold. See Note 434351.1 for further details.

Scheduler
Run Queue
Process descriptors
Interrupts / Exceptions / Signals
Signal handing queue
Timers
POSIX timer cache
Networking
Generic flow cache
RPC buffers, tasks and i-node caches
TCP/IP hash tables, caches, buckets
BSD Unix domain sockets, tables and caches
Storage
Device Mapper transactions, I/O vectors
SCSI command caches, queuing pools
Block Device caches, controls, vectors, queues
Asynchronous I/O controls and context
I/O Scheduling pools and queues
Filesystems:
Buffers
Auditing
I-node caches
File Lock Caches
Directory entry caches
Filesystem specific (ext2,3 etc.) caches and attributes
Journalling structures
Processes
Per process structures
Memory management structure data allocated by each new process
Virtual memory area data allocated by each new process
File and filesystem related data allocated by each new process
Signal cache and handlers allocated by each new process
Task information for every new process forked
Per user structure to keep track of user processes
Memory Management
Internal cache of cache description objects
Anonymous virtual memory addresses for physical to virtual reverse mappings
hugetlbpage-backed filesystem cache
Resizable virtual memory filesystems
Page middle directory
Page global directory
Other
Generic caches of size N
Generic caches of size N for direct memory access use
Common Symptoms of LowMem Pressure
There are two groups of issues happens in the LowMem area:

Memory shortage: The total amount of free bytes gets too low that almost no big allocation is possible in LowMem
Memory fragmentation: There are no contiguous free chunks of larger size and all free memory is in small chunks like 4k or 8k.So some of the symptoms below are based on either or both of the situations.
Memory shortage and fragmentation can be reliably diagnosed by checking /proc/meminfo, /proc/buddyinfo and the SysRq-M output described as below.

Processes are Getting Killed by the OOM Killer
The Out of memory (OOM) killer terminates processes that seem to be idle and having a large memory workset to free out memory when there is a memory shortage. This generally implies LowMem shortage but shall be verified by /proc/meminfo data.

In the messages logfile, you would see something like:

kernel: Out of Memory: Killed process NNNN

OOM Killer Strikes when There is No Swap Usage
The linux system generally does not start to use the swap area (due to lazy swapping) unless the total free memory is really low. So if OOM killer kills some processes but the system was not swapping it means that there are available pages in HighMem zone but we are short of LowMem area.

In the messages log file, you would see something like:

kernel: Out of Memory: Killed process NNNN
and

free

         total       used       free     shared    buffers     cached

...
...
Swap: 2048248 0 2048248

Note that the OOM killer with take action if there is a request for LowMem area and we are short of LowMem.

OOM Killer Strikes when There is Free Memory
If the OOM killer is taking action even if there is free memory:

free

         total       used       free     shared    buffers     cached

Mem: 2067508 1507748 559760 0 38044 1072244
-/+ buffers/cache: 397460 1670048
Swap: 2048248 4 2048244
It would generally mean that we might be out of LowMem free pages.

Kernel is Unable to Spawn Processes
The symptoms would differ according to running application but specifically the fork() system call is failing either with EAGAIN or ENOMEM. From manpage of fork():

EAGAIN fork() cannot allocate sufficient memory to copy the parent process page tables and allocate a task structure for the child.
ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.

System Calls Failing with ENOBUFS / ENOMEM
While diagnosing failure or performance issues with applications and while diagnosing the processes using strace (see man strace) you can see system calls failing with ENOBUFS, ENOMEM or EAGAIN and most likely repeated again and again if you have performance issues. This may be due to lowmem fragmentation or shortage.

With newer kernel versions there are enhancements to avoid fragmentation of LowMem are available.

LowMem Shortage Shown in /proc/meminfo
If there is a shortage in the Lowmem area the /proc/meminfo would be representing it with the 'LowFree' line. Note that even with an healthy system you might see low LowFree values which does not mean a LowMem shortage. For example a system with 2GB of memory and hugemem kernel:

MemTotal: 2061160 kB
MemFree: 10228 kB
Buffers: 119840 kB
Cached: 1307784 kB
Active: 587724 kB
Inactive: 1236924 kB
...
LowTotal: 2061160 kB
LowFree: 10228 kB
Here the system seems to be short of memory, but we see that buffers are high (and they can be released if needed), along with 1.24 GB of cached pages. For the cached pages 1.17 GB of them are inactive, so they can be released if needed. That is based on the workload.

Conversely, even if you have an output as below:

MemTotal: 4026240 kB
MemFree: 758396 kB
...
HighTotal: 3145720 kB
HighFree: 608768 kB
LowTotal: 880520 kB
LowFree: 149628 kB
...
you might still have LowMem problems. Here there are 142MB of LowMem is available, but it might be fragmented. So the /proc/meminfo output does not mean anything alone in many situations. To check fragmentation, see the next section below.

Clear Fragmentation Detected in /proc/buddyinfo & SysRq-M Output
This is the most reliable way to diagnose for lowmem fragmentation. Please see Note 228203.1 about the use of magic keys. When you request the Memory dump you might get something like:

SysRq : Show Memory
...
DMA: 2214kB 208kB 216kB 1932kB 1264kB 2128kB 1256kB 0512kB 11024kB 02048kB 0*4096kB = 3988kB
Normal: 255364kB 35188kB 95916kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 145632kB
HighMem: 614544kB 182198kB 413516kB 155832kB 147264kB 48128kB 1256kB 1512kB 01024kB 02048kB 0*4096kB = 608704kB
In the layout above we see that there is 142MB free space in Lowmem area (normal zone) but it is highly fragmented for 4, 8 and 16 kB chunks. There are no 32 and 64 (or higher) contiguous chunks which a lot applications would need. In that case the applications would have problems based on ENOMEM / ENOBUFS.

Similar information can be gathered live from /proc/buddyinfo too.

cat /proc/buddyinfo

Node 0, zone DMA 221 20 2 19 12 2 1 0 1 0 0
Node 0, zone Normal 25536 3518 959 0 0 0 0 0 0 0 0
Node 0, zone HighMem 61454 18219 4135 1558 1472 48 1 1 0 0 0

Each column of numbers represents the number of pages (221, 20, ...) of that order (0, 1, 2, .. meaning 2^0PAGE_SIZE, 2^1PAGE_SIZE, ...)

Troubleshooting LowMem Pressure Problems
The following are some initial actions that you may take to troubleshoot LowMem pressure issues.

Determine whether you have LowMem shortage only or fragmentation too.
Examine your SLAB cache entries according to Note 434351.1 for growing entries (LowMem shortage) or very active ones (may cause fragmentation) using the tools described. If the applications that are being run behaving as expected then consider doing changes on the system itself as below. If not the application would be needed to be fixed.
If running Enterprise Linux 3, consider:
Switching to hugemem kernel
Upgrading to the U8 errata patch (2.4.21-47) and setting vm.vm-defragment appropriately
Upgrading to Enterprise Linux 4 or higher and setting vm.lower_zone_protection to 100 (This applies to x86 32-bit architectures only - i.e. not valid for x86-64)
On Linux x86-64 environments (and recent i686 2.6.X kernels)increasing the value for vm.min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before, therefore it can help to decrease the LowMem pressure.
For the case of fragmentation, if you find out that running application is demanding too large contiguous space, the application might need to be fixed (See Note 419871.1)
If the application running can employ HugePages (see Note 361323.1) and does a lot lowmem operation where it can do it with HugePages, enabling HugePages will help with the LowMem pressure. Note that the Oracle RDBMS is not an example of such application as the portions of SGA that can use HugePages are already allocating from HighMem.
REFERENCES
NOTE:264236.1 - Considerations on using "hugemem" Kernel vs SMP kernel
NOTE:275318.1 - The Bigpages Feature on Linux
NOTE:360402.1 - Aggressive swapping and low resources during RMAN backup on Linux X86
NOTE:419871.1 - Failures due to "skgxpvfymmtu: process failed because of a resource problem in the OS" on 32-bit Linux
NOTE:452000.1 - Linux: Out-of-Memory (OOM) Killer

NOTE:396038.1 - Common Misconceptions About Linux Kernel Structures
NOTE:405720.1 - o2net Using High CPU and Cluster Node Evictions
NOTE:434351.1 - Linux Kernel: The SLAB Allocator

相关文章
|
24天前
|
Linux API C语言
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
30 0
【Linux系统编程】深入理解Linux 组ID和附属组ID的查询与设置
|
25天前
|
Linux
Linux系统之id命令的基本使用
Linux系统之id命令的基本使用
31 5
Linux系统之id命令的基本使用
|
28天前
|
存储 安全 Shell
【Shell 命令集合 系统管理 】Linux 显示当前用户的身份信息 id命令 使用指南
【Shell 命令集合 系统管理 】Linux 显示当前用户的身份信息 id命令 使用指南
36 0
|
4月前
|
存储 Linux 开发工具
Linux UID和GID(用户ID和组ID)
Linux UID和GID(用户ID和组ID)
86 0
|
5月前
|
Linux
Linux命令(20)之id
Linux命令(20)之id
23 0
|
7月前
|
Linux C++
基于ARM-contexA9-Linux驱动开发:如何获取板子上独有的ID号
基于ARM-contexA9-Linux驱动开发:如何获取板子上独有的ID号
75 0
|
8月前
|
存储 Linux 数据安全/隐私保护
什么是 Linux 中的机器 ID?
什么是 Linux 中的机器 ID?
322 0
|
8月前
|
Linux
如何在Linux中更改用户ID?
如何在Linux中更改用户ID?
526 0
|
Linux
LINUX获取当前窗口的ID
LINUX获取当前窗口的ID
425 0
|
Linux 网络安全 数据安全/隐私保护
linux中的passwd 设置用户密码、id 查看用户是否存在、su 切换用户linux中的cal 查看日历、用户管理命令、useradd 添加新用户linux的date 显示非当前时间、date 设置系统时间、ntpdate命令
linux中的passwd 设置用户密码、id 查看用户是否存在、su 切换用户linux中的cal 查看日历、用户管理命令、useradd 添加新用户linux的date 显示非当前时间、date 设置系统时间、ntpdate命令
linux中的passwd 设置用户密码、id 查看用户是否存在、su 切换用户linux中的cal 查看日历、用户管理命令、useradd 添加新用户linux的date 显示非当前时间、date 设置系统时间、ntpdate命令