Why does my Red Hat Enterprise Linux (RHEL) system swap despite of having free RAM?
Swapping in a linux system happens under two
conditions.
1) Anonymous mapped memory.
2) Oversized workload abusing memory overcommit.
In our problem, only point 1 is valid. So, we will discuss that here.
1) Anonymous mapped memory
Only pages which aren't filesystem backed i.e. doesn't have in disk place, gets swapped. The pages which aren't filesystem backed are called anonymous pages. This can be checked in /proc/meminfo file with the AnonPages field. Unmapped pages are those which are not present in address space of any process. they are not present in page tables. But anon mapped memory is present in page table and exists in process address space. Since they are anon, they are the candidates to be swapped.
The page frame reclaim algorithm (PFRA) performs periodic reclaiming in a linux system. It does this by two methods. One is by kswapd which invoke shrink_zone( ) and shrink_slab( ) to reclaim pages from the LRU (Least Recently Used) lists and another is cache_reap( ) function which reclaims unused slabs from the slab allocator.
Now, the entire RAM is divided into zones, as can be seen in /proc/zoneinfo . Each zone in memory has a field called pages(low). Now, the function __alloc_pages( ) might discover that there are zones where the pages(free) is less than pages(low) and then it will invoke kswapd kernel thread. This indicates that the kernel is reclaiming some pages to avoid the aggressive 'low on memory' or 'out of memory' conditions.
But why do it? We have try_to_free_pages() which does the direct path reclaiming. So why we can't just sit around and wait for low on memory condition and then work. Because that will be inefficient. Because, apart from user space processes, kernel can also request pages in interrupt context. That means it can't wait until free pages are discovered. And if there is some critical code that has already acquired an exclusive lock and wants memory, we have to satisfy it. So, to be on the safe side, we have kswapd doing the reclaim. Now, this reclaim is also done from the LRU list. Meaning, only the inactive pages in the anonymous memory region are reclaimed.
Summing up, every time we hit the page allocator, i.e. whenever some memory allocation is being done, the low, high and free watermarks in a memory zone are checked and if it finds that the zone is not balanced, kswapd wakes up and balances the zone. The logic is, why would kernel want some modified anon pages to lie around for a longer time when it can sort out the allocation in a better way.
Other notes
There are two other properties of virtual memory which you should consider in these circumstances:
• When the kernel needs contiguous memory for some reason, and memory is fragmented, this is handled as if the system were low on memory, and swapping can occur.
• The kernel strongly prefers to make NUMA-local allocations, often to the point of swapping rather than making an allocation on another node.
This is the default expected behaviour under a linux system. This is how the PFRA works.
Check /proc/meminfo file and note the counters of SwapTotal and SwapFree .