Delaying memory allocation when a process requests it is good for performance. Due to reference locality, most programs that request large memory allocations don’t allocate all of it at once. For program memory allocation, it will be done gradually to avoid using more than necessary.
It’s important to understand that there is also priority management based on who makes the request. For virtual memory allocation, for example, when the kernel makes a request, the memory is allocated immediately, whereas a user request will be handled gradually as needed. There are good reasons for these allocation choices. In fact, many RAM-intensive programs have sections that are rarely used. It’s therefore unnecessary to load everything into memory if not everything is used. This helps avoid memory waste. A process whose memory allocation has been delayed during the last minute is referenced as being in demand for pagination.
It’s possible to tune this allocation a bit for applications that typically allocate large blocks and then free the same memory. It also works well for applications that allocate a lot at once and then quit. You need to adjust the sysctl settings:
1
vm.min_free_kbytes=<value>
This helps reduce pagination request times; memory is only used for what it really needs, and it can put pressure on ZONE_NORMAL1.
It’s advantageous for certain applications to let the kernel allocate more memory than the system can offer. This can be done with virtual memory. Using the vm.overcommit_memory parameter in sysctl, it’s possible to ask the kernel to allow an application to make many small allocations:
1
vm.overcommit_memory=1
To disable this feature:
1
vm.overcommit_memory=0
It’s also possible to use value 2. This allows overcommitting by an amount equal to the swap size + 50% of physical memory. The 50% can be changed via the ratio parameter:
1
2
vm.overcommit_memory=2vm.overcommit_ratio=50
To estimate the RAM size needed to avoid an OOM (Out Of Memory) condition for the current system workload:
The Slab cache contains pre-allocated memory pools that the kernel will use when it needs to provide space for different types of data structures. When these data structures map only very small pages or are so small that several of them fit into a single page, it’s more efficient for the kernel to allocate pre-allocated memory from the Slab memory space. To get this information:
When a process references a file, the kernel creates and associates a ‘dentry object’ for each element in its pathname. For example, for /home/pmavro/.zshrc, the kernel will create 4 ‘dentry objects’:
/
home
pmavro
zshrc
Each dentry object points to the inode associated with its file. To avoid reading from disk each time these same paths are used, the kernel uses the dentry cache where dentry objects are stored. For the same reasons, the kernel also caches information about inodes, which are therefore contained in the slab.
Many network performance problems can be due to the ARP cache being too small. By default, it’s limited to 512 soft entries and 1024 hard entries at the Ulimits level. The soft limit becomes a hard limit after 5 seconds. When this limit is exceeded, the kernel performs garbage collection and scans the cache to purge entries to stay below this limit. This garbage collector can also lead to a complete cache deletion. Let’s say your cache is limited to 1 entry but you’re connecting from 2 remote machines. Each incoming and outgoing packet will cause garbage collection and reinsertion into the ARP cache. There will therefore be a permanent change in the cache. To give you an idea of what can happen on a system:
There is also another option that allows you to set the minimum time of jiffies in user space to cached entries. There are 100 jiffies in user space in 1 second:
A very large percentage of pagination activity is due to IO. For reading from disk to memory for example, it forms page cache. Here are the cases of page cache verification for IO requests:
Reading and writing files
Reading and writing via block device files
Access to memory-mapped files
Access that swaps pages
Reading directories
To see the page cache allocations, just look at the buffer caches:
In Linux, only certain types of pages are swapped. There’s no need to swap text-type programs because they already exist on disk. Also, for memory that has been used to store files with modified content, the kernel will take the lead and write the data to the file it belongs to rather than to swap. Only pages that have no association with a file are written to swap.
The swap cache is used to keep track of pages that have previously been taken out of swap and haven’t been re-swapped since. If the kernel swaps threads that need to swap a page later, if it finds an entry for this page in the swap cache, it’s possible to swap without having to write to disk.
The statm file for each PID allows you to see anonymous pages (here PID 1):
1
2
> cat /proc/1/statm
265920917490810
2659: total program size
209: resident set size (RSS)
174: shared pages (from shared mappings)
9: text (code)
81: data + stack
This therefore contains the RSS and shared memory used by a process. But actually the RSS provided by the kernel consists of anonymous and shared pages, hence:
Another thing that consumes memory is the memory for IPC communications. Semaphores allow 2 or more processes to coordinate access to shared resources. Message Queues allow processes to coordinate for message exchanges. Shared memory regions allow processes to communicate by reading and writing to the same memory regions.
A process may wish to use one of these mechanisms but must make appropriate system calls to access the desired resources. It’s possible to put limits on these IPCs on SYSV systems. To see the current list:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> ipcs -l
------ Shared Memory Limits --------
max number of segments=4096max seg size (kbytes)=32768max total shared memory (kbytes)=8388608min seg size (bytes)=1------ Semaphore Limits --------
max number of arrays=128max semaphores per array=250max semaphores system wide=32000max ops per semop call=32semaphore max value=32767------ Messages Limits --------
max queues system wide=7599max size of message (bytes)=8192default max size of queue (bytes)=16384
Using /dev/shm can be a solution to significantly reduce the service time of certain applications. However, be careful when using this system as temporary storage space because it’s in memory. There’s also an ‘ipcrm’ command to force the deletion of shared memory segments. But generally, you’ll never need to use this command.
It’s possible to tune these values (present in /proc/sys/kernel) via sysctl:
1
2
> cat sem
250 32000 32 128
250: maximum number of semaphores per semaphore array
32000: maximum number of semaphores allocated on the system side
32: maximum number of operations allocated per semaphore system call
128: number of semaphore arrays
If you want to modify them:
1
kernel.sem =250256000321024
There are other interesting parameters (with their default values):
1
2
3
4
5
6
7
8
9
10
11
12
# Maximum number of bytes in a message queuekernel.msgmnb=16384# Maximum number of message identifiers in the queuekernel.msgmni=16# Maximum size of a message that can be passed to a process (this memory cannot be swapped)kernel.msgmax=8192# Maximum number of shared memory segments on the system sidekernel.shmmni=4096# Maximum size of shared memory segments that can be created. A 32-bit system supports up to 4G - 1 maximumkernel.shmmax=33554432# Total amount of shared memory in pages that can be used at once on the system side. This value must be at least kernel.shmmax/PAGE_SIZE (4KiB on 32-bit)kernel.shmall=2097152
For more information, see the man page for proc(5).
> vmstat -s
3892968 K total memory
3585172 K used memory
1991172 K active memory
1348148 K inactive memory
307796 K free memory
230100 K buffer memory
1822744 K swap cache
3903484 K total swap
4140 K used swap
3899344 K free swap
397323 non-nice user cpu ticks
6518 nice user cpu ticks
102540 system cpu ticks
5898943 idle cpu ticks
146534 IO-wait cpu ticks
1 IRQ cpu ticks
1476 softirq cpu ticks
0 stolen cpu ticks
24899538 pages paged in
24575197 pages paged out
43 pages swapped in
1061 pages swapped out
38389133 interrupts
74156999 CPU context switches
1347436271 boot time171650 forks