Xen Memory Management

简介:
  • All low-level memory operations go through Xen.
  • Guest OSes are responsible for allocating and initializing PTs for processes (restricted to read only access)
    • allocates and initialize a page and register it with Xen to serve as the new PT
  • Direct page writes are intercepted, validated and applied by the Xen VMM
    • update can be batched into a single hypercall (reduce cost of entering/exiting Xen)
  • page_info struct associated with each machine page frame
    • page type (none, l1, l2, l3, l4, LDT, GDT, RW)
    • reference count – number of references to the page
    • page frame can be reused only when unpinned and its reference count is zero
  • Each domain has a maximum and current memory allocation
    • max allocation is set at domain creation time and cannot be modified
  • PT updates
    • hypercall –> mmu_update()
    • writable page tables –> vm_assist()
  • Xen exists in the top 64MB (0xFC000000 – 0xFFFFFFFF) section of every guest virtual address space (TLB flush avoided when entering/leaving the hypervisor)
    • not accessible or remappable by guest OSes.
  • “fast handler” for system calls - direct access from app into guest OS, without going through Xen
    • muse execute outside Ring 0
  • Each guest supports a “ballon” memory management driver - that is used by the VMM to dynamically adjust the guest’s memory usage
  • Page fault handling
    • faulting address is written into an extended stack frame on the guest OS stack (normally the faulting address is read from a privileged processor register (CR2))
  • In terms of page protection, Ring1/2 are considered to be part of ‘supervisor mode’. The WP bit in CR0 controls whether read-only restrictions are respected in supervisor mode – if the bit is clear then any mapped page is writable. Xen gets around this by always setting the WP bit and disallowing updates to it. xen/arch/x86/boot/x86_32.S#153
  • Xen provides a domain with a list of machine frames during bootstrapping, and it is the domain’s responsibility to create the pseudo-physical address space from this

No guarantee that a domain will receive a contiguous stretch of physical memory. Most OSes do not have good support for operating in a fragmented physical address space.

  • Machine memory
    • entire amount of memory installed in the machine (physical memory)
    • 4kB machine page frames numbered consecutively starting from 0.
  • Pseudo-physical memory
    • per-domain abstraction.
    • allows a guest OS to consider its memory allocation to consist of a contiguous range of physical page frames starting at physical frame 0.
  • machine-to-physical table
    • globally readable table maintained by Xen
    • records the mapping from machine addresses to pseudo-physical addresses
    • table size is proportional to the amount of RAM installed in the machine
  • physical-to-machine table
    • per-domain table which performs the inverse (physical-to-machine) mapping.
    • table size is proportional to the memory allocation of the given domain.

(XEN) VIRTUAL MEMORY ARRANGEMENT (for DOM0)
(XEN) Loaded kernel: c0100000→c042e254
(XEN) Init. ramdisk: c042f000→c07fca00
(XEN) Phys-Mach map: c07fd000→c086e894 == 454 MB (as can be verified by: xm list)
(XEN) Start info: c086f000→c0870000
(XEN) Page tables: c0870000→c0874000 == 16 MB
(XEN) Boot stack: c0874000→c0875000
(XEN) TOTAL: c0000000→c0c00000 
(XEN) ENTRY ADDRESS: c0100000


x86-32 Xen supports only guests with 2-level page tables. PGD = l2, PTE =l1


How to intercept interrupts from guest domains
http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00597.html
http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00604.html

Page fault handling for Xen guests
http://lists.xensource.com/archives/html/xen-devel/2006-02/msg00263.html

show pagetable walk if guest cannot handle page
http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00612.html

Memory management, mapping, paging questions...
http://lists.xensource.com/archives/html/xen-devel/2006-10/msg01151.html

Information related to shadowing
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00319.html
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00793.html
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00802.html

How to intercept memory operation in Xen
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00659.html
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00664.html
http://lists.xensource.com/archives/html/xen-devel/2006-11/msg00717.html

alert message from dom0 to domU
http://lists.xensource.com/archives/html/xen-devel/2006-12/msg00967.html

Share Memory Between DomainU and Domain0
http://lists.xensource.com/archives/html/xen-devel/2006-12/msg01008.html

Call hypercall straightly from user space
http://lists.xensource.com/archives/html/xen-devel/2006-12/msg01061.html


xen/arch/x86/traps.c#do_page_fault –> fixup_page_fault –> mm.c#ptwr_do_page_fault


xen-3.0.2-2/xen/arch/x86/setup.c#__start_xen()
                |                                 \
                v                                  \
xen-3.0.2-2/xen/common/domain.c#domain_create()     \
                |                                    \
                v                                     \
xen-3.0.2-2/xen/arch/x86/domain.c#arch_domain_create() \
                                                        \
                                                         v
                xen-3.0.2-2/xen/arch/x86/domain_build.c#construct_dom0()

Xen-ELF image vmlinux-syms-2.6.16-xen has a special'__xen_guest' section


Xen hypercall table:
/xen-3.0.2-2/xen/arch/x86/x86_32/entry.S


#I think this is called when DOM0 attempts to create a DOMU
xen-3.0.2-2/xen/common/dom0_ops.c#do_dom0_op()




trousers-0.2.7/src/tspi/spi_tpm.c#Tspi_TPM_Quote()
                |
                v
trousers-0.2.7/src/tcsd_api/calltcsapi.c#TCSP_Quote()
                |
                v
trousers-0.2.7/src/tcsd_api/tcstp.c#TCSP_Quote_TP()
                |
                v
trousers-0.2.7/src/tcsd_api/tcstp.c#sendTCSDPacket()

原文:https://wiki.cs.dartmouth.edu/nihal/doku.php/xen:memory

一.x86_64是怎么嵌入到Dom0的线性空间的
IA32是通过段保护机制做到的:高64M为Ring-0的Xen空间;
1G-64M为Kernel的Ring-1空间;
其他的3G给Application

x86_64没有段保护机制,必须用页保护机制:2^64-2^47 --> 2^64 == 内核空间
0 --> 2^47 == 用户空间
中间空的部分可以作为他用 == 被Xen用了

二.Xen采用直接模式 == Guest OS使用自己的页表直接访问HPA
方法: 页表里的内容为HPA;页表项Guest OS只可读;普通的页Guest OS可直接读写。
一旦更新引起Page异常。如果想要更新/操作页表,可以调用相应的Hypercall。
VMM也能保证Guest OS只能访问自己的内存。

Guest OS操作内存的流程:
1.Guest OS访问一个新内存地址(GVA),PageFault ==> 更新Guest OS的页表
2.Guest OS先找到页表的GPA,VMM根据GPA找到该GPA对应的HPA(通过P2M)
==> 相当于页表更新,调用页表更新的Hypercall(GPA,HPA)
3.如果子页表不存在,需要挂接该子页
==> 相当于页表挂接操作,调用页表操作的Hypercall(线性地址,HPA)
4.访问该PT表,重复以上2-3步,最终得到一个GVA==>HPA的地址

三.可写页表
由于对页表的操作开销比较大(每次都要进行Hypercall调用),在某些情况下可以改进它()。
方法是:先把页表(实际上只要把总表PD表)拿下来,不让别人访问,把它作为Guest OS的普通的可读写页
Guest OS随便更改,很多次更改完成后,最后提交给Hypercall,让VMM一次完全的完成更新操作。
前提:PAE模式。因为PDE只有一个PD页。

四.Balloon驱动(存在的Dom0和DomU中)
为Dom0和DomU申请/释放内存
可以查看自己和全Machine的内存状况
Balloon驱动根据设置在XenStore的中的目标值来自动调整它的内存的大小。

五.共享页是怎么实现的
Start Info Page(包括里面的内容)是VMM在Domain初始化时拼成的,它的内容包括了Shared Info Page和XenStore的连接,进入Domain的前几件事就是把本Doamin的Shared Info Page利用页表更新上真正VMM已经分配了的存在Start Info Page。
HVM的PV驱动(主要是)当然也要用Shared Info Page,它的Shared Info Page是自己拼成的。

4.就算是Dom0利用VT-x不也很好吗,用了吗?
没有用,半虚拟化不需要用VT-x技术,目的是为了提高系统的性能
5.PAE模式是什么,有什么影响
物理地址扩展 (PAE) 允许将最多64GB 的物理内存用作常规的4KB 页面,并扩展内核能使用的位数以将物理内存地址从32扩展到36。

Dom0只有在迁移的时候才用到影子页表,其他时候都用直接访问物理内存。

注释:
gpfn/gfn: guset page frame number 客户物理页面号(客户操作系统使用gpfn/gfn对客户物理地址空间寻址)
mfn: machine page frame number 机器页面号
smfn: machine page frame number for shadow pages shadow页面所在的机器页面号
l1e: level 1 page table entry 
gl1e: level 1 guest page table entry
sl1e: level 1 shadow page table entry 一级shadow页表项
PV: para-virtualization
HVM: Hardware assistant Virtual Machine

 
本文转自feisky博客园博客,原文链接:http://www.cnblogs.com/feisky/archive/2012/04/09/2439067.html,如需转载请自行联系原作者

相关文章
|
6月前
|
Oracle 关系型数据库 Linux
Disable NUMA on database servers to improve performance of Linux file system utilities
Disable NUMA on database servers to improve performance of Linux file system utilities
47 3
|
缓存 监控 网络协议
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(八)
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(八)
98 0
|
缓存 监控 Ubuntu
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(七)
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(七)
158 0
|
缓存 监控 网络协议
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(三)
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(三)
322 0
|
监控 Ubuntu Linux
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(二)
译|Monitoring and Tuning the Linux Networking Stack: Receiving Data(二)
112 0