Vanilla kernel的问题
Linux kernel在spinlock、irq上下文方面无法抢占,因此高优先级任务被唤醒到得以执行的时间并不能完全确定。同时,Linux kernel本身也不处理优先级反转。RT-Preempt Patch是在Linux社区kernel的基础上,加上相关的补丁,以使得Linux满足硬实时的需求。本文描述了该patch在PC上的实践。我们的 测试环境为Ubuntu 10.10,默认情况下使用Ubuntu 10.10自带的kernel:
barry@barry-VirtualBox:/lib/modules$ uname -a 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux
在Ubuntu 10.10,apt-get install rt-tests安装rt测试工具集,运行其中的cyclictest测试工具,默认创建5个SCHED_FIFO策略的realtime线程,优先级 76-80,运行周期是1000,1500,2000,2500,3000微秒:
barry@barry-VirtualBox:~/development/panda/android$ sudo cyclictest -p 80 -t5 -n [sudo] password for barry: policy: fifo: loadavg: 9.22 8.57 6.75 11/374 21385 T: 0 (20606) P:80 I:1000 C: 18973 Min: 26 Act: 76 Avg: 428 Max: 12637 T: 1 (20607) P:79 I:1500 C: 12648 Min: 31 Act: 68 Avg: 447 Max: 10320 T: 2 (20608) P:78 I:2000 C: 9494 Min: 28 Act: 151 Avg: 383 Max: 9481 T: 3 (20609) P:77 I:2500 C: 7589 Min: 29 Act: 889 Avg: 393 Max: 12670 T: 4 (20610) P:76 I:3000 C: 6325 Min: 37 Act: 167 Avg: 553 Max: 13673
由此可见在标准Linux内,rt线程投入运行的jitter非常不稳定,最小值在26-37微秒,平均值为68-889微秒,而最大值则分布在9481-13673微秒之间。
我们还是运行这个测试,但是在运行这个测试的过程中引入更多干扰,如mount /dev/sdb1 ~/development,则结果变为:
barry@barry-VirtualBox:~$ sudo cyclictest -p 80 -t5 -n policy: fifo: loadavg: 0.14 0.29 0.13 2/308 1908 T: 0 ( 1874) P:80 I:1000 C: 28521 Min: 0 Act: 440 Avg: 2095 Max: 331482 T: 1 ( 1875) P:79 I:1500 C: 19014 Min: 2 Act: 988 Avg: 2099 Max: 330503 T: 2 ( 1876) P:78 I:2000 C: 14261 Min: 7 Act: 534 Avg: 2096 Max: 329989 T: 3 ( 1877) P:77 I:2500 C: 11409 Min: 4 Act: 554 Avg: 2073 Max: 328490 T: 4 ( 1878) P:76 I:3000 C: 9507 Min: 12 Act: 100 Avg: 2081 Max: 328991
mount过程中引入的irq、softirq和spinlock导致最大jitter明显地加大甚至达到了331482us,充分显示出了标准Linux内核中RT线程投入运行时间的不可预期性(硬实时要求意味着可预期)。
如果我们编译一份kernel,选择的是“Voluntary Kernel Preemption (Desktop)“,这类似于2.4不支持kernel抢占的情况,我们运行同样的case,时间的不确定性大地几乎让我们无法接受:
barry@barry-VirtualBox:~$ sudo /usr/local/bin/cyclictest -p 80 -t5 -n # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.23 0.30 0.15 3/247 5086 T: 0 ( 5082) P:80 I:1000 C: 5637 Min: 60 Act:15108679 Avg:11195196 Max:15108679 T: 1 ( 5083) P:80 I:1500 C: 5723 Min: 48 Act:12364955 Avg:6389691 Max:12364955 T: 2 ( 5084) P:80 I:2000 C: 4821 Min: 32 Act:11119979 Avg:8061814 Max:11661123 T: 3 ( 5085) P:80 I:2500 C: 3909 Min: 27 Act:11176854 Avg:4563549 Max:11176854 T: 4 ( 5086) P:80 I:3000 C: 3598 Min: 37 Act:9951432 Avg:8761137 Max:116026155
RT-Preempt Patch使能
RT-Preempt Patch对Linux kernel的主要改造包括:
- Making in-kernel locking-primitives (using spinlocks) preemptible though reimplementation with rtmutexes:
- Critical sections protected by i.e. spinlock_t and rwlock_t are now preemptible. The creation of non-preemptible sections (in kernel) is still possible with raw_spinlock_t (same APIs like spinlock_t)
- Implementing priority inheritance for in-kernel spinlocks and semaphores. For more information on priority inversion and priority inheritance please consultIntroduction to Priority Inversion
- Converting interrupt handlers into preemptible kernel threads: The RT-Preempt patch treats soft interrupt handlers in kernel thread context, which is represented by a task_struct like a common userspace process. However it is also possible to register an IRQ in kernel context.
- Converting the old Linux timer API into separate infrastructures for high resolution kernel timers plus one for timeouts, leading to userspace POSIX timers with high resolution.
在本试验中,我们取的带RT- Preempt Patch的kernel tree是git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable- rt.git,使用其v3.4-rt-rebase branch,编译kernel时选中了"Fully Preemptible Kernel"抢占模型:
───────────────────────── Preemption Model ─────────────────────────┐
│ │ ( ) No Forced Preemption (Server)
│ │ ( ) Voluntary Kernel Preemption (Desktop)
│ │ ( ) Preemptible Kernel (Low-Latency Desktop)
│ │ ( ) Preemptible Kernel (Basic RT)
│ │ (X) Fully Preemptible Kernel (RT)
另外,kernel中需支持tickless和高精度timer:
┌───────────────────Processor type and features ─────────────────────────┐
│ │ [*] Tickless System (Dynamic Ticks)
│ │ [*] High Resolution Timer Support
make modules_install、make install、mkintramfs后,我们得到一个可以在Ubuntu中启动的RT kernel。具体编译方法可详见http://www.linuxidc.com/Linux/2012-01/50749.htm,根据该文修改版本 号等信息即可,我们运行的命令包括:
安装模块
barry@barry-VirtualBox:~/development/linux-2.6$ sudo make modules_install .... INSTALL /lib/firmware/whiteheat_loader.fw INSTALL /lib/firmware/whiteheat.fw INSTALL /lib/firmware/keyspan_pda/keyspan_pda.fw INSTALL /lib/firmware/keyspan_pda/xircom_pgs.fw INSTALL /lib/firmware/cpia2/stv0672_vp4.bin INSTALL /lib/firmware/yam/1200.bin INSTALL /lib/firmware/yam/9600.bin DEPMOD 3.4.11-rt19
安装kernel
barry@barry-VirtualBox:~/development/linux-2.6$ sudo make install sh /home/barry/development/linux-2.6/arch/x86/boot/install.sh 3.4.11-rt19 arch/x86/boot/bzImage \ System.map "/boot"
制作initrd
barry@barry-VirtualBox:~/development/linux-2.6$ sudo mkinitramfs 3.4.11-rt19 -o /boot/initrd.img-3.4.11-rt19
修改grub配置
在grub.conf中增加新的启动entry,仿照现有的menuentry,增加一个新的,把其中的相关版本号都变更为3.4.11-rt19,我们的修改如下:
menuentry 'Ubuntu, with Linux 3.4.11-rt19' --class ubuntu --class gnu-linux --class gnu --class os { recordfail insmod part_msdos insmod ext2 set root='(hd0,msdos1)' search --no-floppy --fs-uuid --set a0db5cf0-6ce3-404f-9808-88ce18f0177a linux /boot/vmlinuz-3.4.11-rt19 root=UUID=a0db5cf0-6ce3-404f-9808-88ce18f0177a ro quiet splash initrd /boot/initrd.img-3.4.11-rt19 }
开机时选择3.4.11-rt19启动:
RT-Preempt Patch试用
运行同样的测试cyclictest benchmark工具,结果迥异:
barry@barry-VirtualBox:~$ sudo cyclictest -p 80 -t5 -n WARNING: Most functions require kernel 2.6 policy: fifo: loadavg: 0.71 0.42 0.17 1/289 1926 T: 0 ( 1921) P:80 I:1000 C: 7294 Min: 7 Act: 89 Avg: 197 Max: 3177 T: 1 ( 1922) P:79 I:1500 C: 4863 Min: 10 Act: 85 Avg: 186 Max: 2681 T: 2 ( 1923) P:78 I:2000 C: 3647 Min: 15 Act: 93 Avg: 160 Max: 2504 T: 3 ( 1924) P:77 I:2500 C: 2918 Min: 23 Act: 67 Avg: 171 Max: 2114 T: 4 ( 1925) P:76 I:3000 C: 2432 Min: 19 Act: 134 Avg: 339 Max: 3129
我们还是运行这个测试,但是在运行这个测试的过程中引入更多干扰,如mount /dev/sdb1 ~/development,则结果变为:
barry@barry-VirtualBox:~$ sudo cyclictest -p 80 -t5 -n # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.11 0.12 0.13 1/263 2860 T: 0 ( 2843) P:80 I:1000 C: 28135 Min: 5 Act: 198 Avg: 200 Max: 7387 T: 1 ( 2844) P:80 I:1500 C: 18756 Min: 22 Act: 169 Avg: 188 Max: 6875 T: 2 ( 2845) P:80 I:2000 C: 14067 Min: 7 Act: 91 Avg: 149 Max: 7288 T: 3 ( 2846) P:80 I:2500 C: 11254 Min: 19 Act: 131 Avg: 155 Max: 6287 T: 4 ( 2847) P:80 I:3000 C: 9378 Min: 25 Act: 58 Avg: 172 Max: 6121
时间在可预期的范围内,没有出现标准kernel里面jitter达到331482的情况。需要说明的是,这个jitter大到超过了我们的预期,达到了10ms量级,相信是受到了我们的测试都是在Virtualbox虚拟机进行的影响。按照其他文档显示,这个jitter应该在数十us左右。
我们在这个kernel里面运行ps aux命令,可以看出线程化了的irq:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.8 0.1 2880 1788 ? Ss 18:39 0:03 init root 2 0.0 0.0 0 0 ? S 18:39 0:00 kthreadd ... root 45 0.0 0.0 0 0 ? S 18:39 0:00 irq/14-ata_piix root 46 0.0 0.0 0 0 ? S 18:39 0:00 irq/15-ata_piix root 50 0.0 0.0 0 0 ? S 18:39 0:00 irq/19-ehci_hcd root 51 0.0 0.0 0 0 ? S 18:39 0:00 irq/22-ohci_hcd root 55 0.0 0.0 0 0 ? S 18:39 0:00 irq/12-i8042 root 56 0.0 0.0 0 0 ? S 18:39 0:00 irq/1-i8042 root 57 0.0 0.0 0 0 ? S 18:39 0:00 irq/8-rtc0 root 863 0.0 0.0 0 0 ? S 18:39 0:00 irq/19-eth0 root 864 0.0 0.0 0 0 ? S 18:39 0:00 irq/16-eth1 root 1002 0.5 0.0 0 0 ? S 18:39 0:01 irq/21-snd_inte ...
在其中编写一个RT 线程的应用程序,通常需要如下步骤:
- Setting a real time scheduling policy and priority.
- Locking memory so that page faults caused by virtual memory will not undermine deterministic behavior
- Pre-faulting the stack, so that a future stack fault will not undermine deterministic behavior
例 子test_rt.c,其中的mlockall是为了防止进程的虚拟地址空间对应的物理页面被swap出去,而stack_prefault()则故意提 前导致stack往下增长8KB,因此其后的函数调用和局部变量的使用将不再导致栈增长(依赖于page fault和内存申请):
#include <stdlib.h> #include <stdio.h> #include <time.h> #include <sched.h> #include <sys/mman.h> #include <string.h> #define MY_PRIORITY (49) /* we use 49 as the PRREMPT_RT use 50 as the priority of kernel tasklets and interrupt handler by default */ #define MAX_SAFE_STACK (8*1024) /* The maximum stack size which is guaranteed safe to access without faulting */ #define NSEC_PER_SEC (1000000000) /* The number of nsecs per sec. */ void stack_prefault(void) { unsigned char dummy[MAX_SAFE_STACK]; memset(dummy, 0, MAX_SAFE_STACK); return; } int main(int argc, char* argv[]) { struct timespec t; struct sched_param param; int interval = 50000; /* 50us*/ /* Declare ourself as a real time task */ param.sched_priority = MY_PRIORITY; if(sched_setscheduler(0, SCHED_FIFO, ¶m) == -1) { perror("sched_setscheduler failed"); exit(-1); } /* Lock memory */ if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1) { perror("mlockall failed"); exit(-2); } /* Pre-fault our stack */ stack_prefault(); clock_gettime(CLOCK_MONOTONIC ,&t); /* start after one second */ t.tv_sec++; while(1) { /* wait until next shot */ clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t, NULL); /* do the stuff */ /* calculate next shot */ t.tv_nsec += interval; while (t.tv_nsec >= NSEC_PER_SEC) { t.tv_nsec -= NSEC_PER_SEC; t.tv_sec++; } } }
编译之:gcc -o test_rt test_rt.c -lrt。本节就到这里,后续我们会有一系列博文来描述RT-Preempt Patch对kernel的主要改动,以及其工作原理。
本文转自 21cnbao 51CTO博客,原文链接:http://blog.51cto.com/21cnbao/1011931,如需转载请自行联系原作者