There are several clocks that allow you to obtain or manipulate time operations:
RTC (Real Time Clock): this is the BIOS battery that keeps the date and time on a machine when it is turned off. You can get information about it in the /proc/driver/rtc file.
TSC (Time Stamp Counter): this is a counter that is set to the same frequency as the CPU, even if it oscillates. The kernel uses the TSC with the RTC to calculate the date and time.
PIC (Programmable Interrupt Counter): also known as PIT (Programmable Interval Timer) which allows to send interrupts to the kernel after a certain time has passed. It is generally used for process scheduling.
APIC (Advanced Programmable Interrupt Controller): It also operates on the CPU clock and allows tracking of running processes and sends local interruptions to this processor.
On a 2.6 kernel, the frequency of the PIC is 1MHz or 1 tick/ms (also called jiffy). This interval can be adjusted during kernel compilation or in boot parameters (for some distributions). A shorter tick value will give better resolution times, however, applications may run slightly slower.
# /etc/sysconfig/cpuspeed## This configuration file controls the behavior of both the# cpuspeed daemon and various cpufreq modules.# For the vast majority of users, there shouldn't be any need to# alter the contents of this file at all. By and large, frequency# scaling should Just Work(tm) with the defaults.### DRIVER #### Your CPUFreq driver module# Note that many drivers are now built-in, rather than built as modules,# so its usually best not to specify one.# default value: empty (try to auto-detect/use built-in)DRIVER=### GOVERNOR #### Which scaling governor to use# Details on scaling governors for your cpu(s) can be found in# cpu-freq/governors.txt, part of the kernel-doc package# NOTES:# - The GOVERNOR parameter is only valid on centrino, powernow-k8 (amd64)# and acpi-cpufreq platforms, other platforms that support frequency# scaling always use the 'userspace' governor.# - Using the 'userspace' governor will trigger the cpuspeed daemon to run,# which provides said user-space frequency scaling.# default value: empty (defaults to ondemand on centrino, powernow-k8,# and acpi-cpufreq systems, userspace on others)GOVERNOR=### FREQUENCIES #### NOTE: valid max/min frequencies for your cpu(s) can be found in# /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies# on systems that support frequency scaling (though only after the# appropriate drivers have been loaded via the cpuspeed initscript).# maximum speed to scale up to# default value: empty (use cpu reported maximum)MAX_SPEED=# minimum speed to scale down to# default value: empty (use cpu reported minimum)MIN_SPEED=### SCALING THRESHOLDS #### Busy percentage threshold over which to scale up to max frequency# default value: empty (use governor default)UP_THRESHOLD=# Busy percentage threshold under which to scale frequency down# default value: empty (use governor default)DOWN_THRESHOLD=### NICE PROCESS HANDLING #### Let background (nice) processes speed up the cpu# default value: 0 (background process usage can speed up cpu)# alternate value: 1 (background processes will be ignored)IGNORE_NICE=0############################################################### HISTORICAL CPUSPEED CONFIG BITS ###############################################################VMAJOR=1VMINOR=1# Add your favorite options here#OPTS="$OPTS -s 0 -i 10 -r"# uncomment and modify this to check the state of the AC adapter#OPTS="$OPTS -a /proc/acpi/ac_adapter/*/state"# uncomment and modify this to check the system temperature#OPTS="$OPTS -t /proc/acpi/thermal_zone/*/temperature 75"
You can get the current information like this:
1
cpuspeed ---help 2>&1 | more
It’s possible to see the possible assignable values:
And finally the governor shows the algorithm used. For example, here we use “on demand”, which changes the processor speed on the fly according to demand:
If you want the best performance, disable this daemon. The drawback is of course the power consumption (think of the environment).
You should know that if you need very low latencies, it is strongly recommended to disable this daemon.
These options are designed to allow the kernel to preempt and schedule certain processes. The gain will be felt at the level of latency (especially network). For example, the kernel can handle disk IO operations and simultaneously receive interrupts from the network card. The handler doing disk IOs can be preempted in favor of the network card interrupt which would improve network latency.
It is nevertheless possible to disable IRQ balancing via a boot parameter:
1
GRUB_CMDLINE_LINUX_DEFAULT="quiet noapic"
If IRQs are unevenly distributed across CPUs, the result can be inconsistent performance when interrupt handlers preempt processes that are on the CPU.
Interceptions allow exploiting cache affinity for CPU and equalizing the number of CPU visits. To give an IRQ affinity to a CPU to improve performance by making the best use of cache affinity, you need to specify the bitmap of a core in hexadecimal. For example:
This will place this IRQ at the head of the active queue and preserve certain CPUs from being used for IRQ assignments. It is possible to configure this permanently in Red Hat in /etc/sysconfig/irqbalance. For those who want, it is possible to disable IRQ balancing:
Each core has its own run queue. For HyperThreaded processors, the logical processor uses the same run queue as the physical core. By default, there is a certain affinity and the tasks that occur on a CPU come back to it more or less automatically if other associated ones were going to see another CPU. Knowing that each CPU has its own cache, it’s better that way. However, if one core is more loaded than another, the scheduler looks at the run queues every 100ms (or 1ms if the core does nothing) and decides to rebalance the load. The problem arises in the case where this balancing system is done too often, we can experience latency to avoid caches miss (everything depends on the applications)! You then have to choose what you want the most. To see the list of programs and their associated core:
If you want to assign specific CPUs to certain processes, it’s possible! The first step is to know the CPU bitmap. To give you an idea of how to get them:
1
2
3
4
5
6
> awk '/processor/{printf("CPU %s address : 0x0000000%s\n"), $3, $3}' /proc/cpuinfo ; echo'All CPU : xXFFFFFFFF'CPU 0 address : 0x00000000
CPU 1 address : 0x00000001
CPU 2 address : 0x00000002
CPU 3 address : 0x00000003
All CPU : xXFFFFFFFF
Then we will use the taskset command to assign a specific CPU to a PID:
1
taskset -p 0x00000001 <PID>
You should know that Numa processors have RAM directly mapped with CPUs to increase performance. This doesn’t change the fact that other processors can use memory that is not associated with them. Here is a small overview of Numa:
You can also specify parameters at the grub level to isolate CPUs (isolcpus):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# grub.conf generated by anaconda## Note that you do not have to rerun grub after making changes to this file# NOTICE: You have a /boot partition. This means that# all kernel and initrd paths are relative to /boot/, eg.# root (hd0,0)# kernel /vmlinuz-version ro root=/dev/mapper/vgos-root# initrd /initrd-[generic-]version.img#boot=/dev/sdadefault=0timeout=5splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.32-279.2.1.el6.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-279.2.1.el6.x86_64 ro root=/dev/mapper/vgos-root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=fr LANG=en_US.UTF-8 rd_LVM_LV=vgos/root rd_NO_MD rd_LVM_LV=vgos/swap SYSFONT=latarcyrheb-sun16 crashkernel=128M biosdevname=0 rd_NO_DM isolcpus=0 initrd /initramfs-2.6.32-279.2.1.el6.x86_64.img
CPU pinning is now possible on this CPU. We’ll therefore have a smaller run queue and improved response times for tasks assigned to this CPU.
cpuset is a more advanced version of taskset that provides a more elegant, flexible and scalable method for controlling runqueues and latency on tasks. A cpuset is a group of CPUs (scheduler domain/cgroups) on which we will be able to balance tasks:
The implementation of cpuset in the kernel is quite small and has no impact on the process scheduler. It uses a new VFS that does not introduce new system calls. This cpuset VFS can be mounted anywhere on the system. We will, for example, mount this in /mnt/cpuserts. Just create folders to make assignments to other CPUs. A CPU can belong to multiple cpusets.
## Copyright IBM Corporation. 2007## Authors: Balbir Singh <balbir@linux.vnet.ibm.com># This program is free software; you can redistribute it and/or modify it# under the terms of version 2.1 of the GNU Lesser General Public License# as published by the Free Software Foundation.## This program is distributed in the hope that it would be useful, but# WITHOUT ANY WARRANTY; without even the implied warranty of# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.#group mariadb_cgroup { perm { admin {uid= tomcat;
} task {uid= tomcat;
}} cpuset { cpuset.mems = 0;
cpuset.cpus ="1,2";
cpuset.cpu_exclusive = 1;
}}
Here is an example for tomcat user, where I want to have 2 dedicated CPUs. Then you need to change the cgrules config:
1
tomcat cpu tomcat_cgroup/
This indicates that tomcat user will change cpu settings and the cgroup folder is tomcat_cgroup (/sys/fs/cgroup/tomcat_cgroup). Now restart it: