一、概述
本文使用的android版本是5.1.0_r1,goldfish内核版本是3.4,android镜像是x86架构的。本文以battery为例,完整地介绍了虚拟设备的实现和使用。
为什么android emulator需要虚拟设备,简单来说就是android系统需要使用,但是host系统却没有,比如gps,bluetooth,battery,gsm等。另外,虚拟设备也提供了android emulator和guest os之间交流的方式,比如emulator控制面板中可以设置电池的电量,是否在充电,如图1所示;也可以设置当前的gps坐标等等;更重要的是,将guest os中画图的操作放到了host中执行,android emulator才能够比较流畅的运行guest os。
图1
整个虚拟设备的框架如图2所示,左上角是guest os;左下角既包括了kernel中虚拟设备的驱动,也包括了emulator中虚拟设备的模拟;右下角是android emulator。
1、guest os通过hal或者直接使用kernel提供的虚拟设备的驱动(一般来说,虚拟设备的驱动会提供一些字符设备文件,以及属性文件,读写这些文件即可)。
2-1、从kernel的角度来看,无需关心设备是真实的,还是虚拟的,只需要关心设备提供的资源,比如IO资源,中断号,以及如何读写设备的寄存器,这里和普通的驱动程序类似。需要注意的是,虚拟设备都挂在platform bus上,方便动态地分配IO内存空间,以及中断号,当然platform bus本身的IO内存和中断号是固定写死的,和emulator中固定写死的相对应。
2-2、从emulator的角度看,首先是platform bus的模拟,需要使用固定写死的IO内存和中断号,这和kernel是相对应的。然后其他虚拟设备动态注册IO内存和中断号到这个platform bus上面。kernel对IO内存进行读写时,emulator很明显可以得知读写的是哪一个虚拟物理地址,然后得到虚拟页面。虚拟页面有相应的信息,可以得到一个io_index变量,使用这个io_index,可以得知该页面是哪一个虚拟设备的IO内存,以及这个虚拟设备自己的读写函数,使用对应设备的读写函数,读写虚拟设备的寄存器(每个虚拟设备的寄存器都放在一个结构体中),根据约定好的寄存器的功能,去接收/返回数据。这里知识点比较多,且涉及到了很多硬件的知识,对于纯软件的开发人员来说,过于复杂,后面会详细讲解。
3、emulator提供了一种抽象的虚拟设备,叫做pipe,对应的设备文件为/dev/qemu_pipe,提供guest os和emulator通用的数据收发方法。基于这一层通用的数据收发方法,在emulator中注册了很多qemud service,guest os可以通过读写/dev/qemu_pipe去和这些qemud service通信。
PS:
1、guest os中有一个qemud进程,使用虚拟设备ttyS1去提供guest os和emulator的交流方式,是一种旧的方式,速度比较慢,已基本被pipe方式代替。
2、关于platform模型,可以看看这篇资料:http://www.wowotech.net/device_model/platform_device.html
当注册一个新的设备时,会将设备作为参数,probe给每一个匹配的驱动程序,看看哪个驱动程序可以处理这个新的设备。
当注册一个新的驱动时,会将驱动作为参数,probe给每一个未被处理的匹配的设备,看看新的驱动可以处理哪一个未被处理的设备。
通过驱动和设备的名字进行匹配。
二、内核中虚拟设备的驱动程序
2.1、battery的驱动
首先是虚拟设备battery文档的学习:
VII. Goldfish battery: ====================== Relevant files: $QEMU/hw/android/goldfish/battery.c $QEMU/hw/power_supply.h $KERNEL/drivers/power/goldfish_battery.c Device properties: Name: goldfish_battery Id: -1 IrqCount: 1 I/O Registers: 0x00 INT_STATUS R: Read battery and A/C status change bits. 0x04 INT_ENABLE W: Enable or disable IRQ on status change. 0x08 AC_ONLINE R: Read 0 if AC power disconnected, 1 otherwise. 0x0c STATUS R: Read battery status (charging/full/... see below). 0x10 HEALTH R: Read battery health (good/overheat/... see below). 0x14 PRESENT R: Read 1 if battery is present, 0 otherwise. 0x18 CAPACITY R: Read battery charge percentage in [0..100] range. A simple device used to report the state of the virtual device's battery, and whether the device is powered through a USB or A/C adapter. The device uses a single IRQ to notify the kernel that the battery or A/C status changed. When this happens, the kernel should perform an IO_READ(INT_STATUS) which returns a 2-bit value containing flags: bit 0: Set to 1 to indicate a change in battery status. bit 1: Set to 1 to indicate a change in A/C status. Note that reading this register also lowers the IRQ level. The A/C status can be read with IO_READ(AC_ONLINE), which returns 1 if the device is powered, or 0 otherwise. The battery status is spread over multiple I/O registers: IO_READ(PRESENT) returns 1 if the battery is present in the virtual device, or 0 otherwise. IO_READ(CAPACITY) returns the battery's charge percentage, as an integer between 0 and 100, inclusive. NOTE: This register is probably misnamed since it does not represent the battery's capacity, but it's current charge level. IO_READ(STATUS) returns one of the following values: 0x00 UNKNOWN Battery state is unknown. 0x01 CHARGING Battery is charging. 0x02 DISCHARGING Battery is discharging. 0x03 NOT_CHARGING Battery is not charging (e.g. full or dead). IO_READ(HEALTH) returns one of the following values: 0x00 UNKNOWN Battery health unknown. 0x01 GOOD Battery is in good condition. 0x02 OVERHEATING Battery is over-heating. 0x03 DEAD Battery is dead. 0x04 OVERVOLTAGE Battery generates too much voltage. 0x05 UNSPEC_FAILURE Battery has unspecified failure. The kernel can use IO_WRITE(INT_ENABLE, <flags>) to select which condition changes should trigger an IRQ. <flags> is a 2-bit value using the same format as INT_STATUS.如果你搞过硬件,可以浏览一下说明,即可知道这个芯片干什么的了,下面一段话无需再看。如果你是搞纯软件的,还是老老实实看吧。
可以把设备当作一个函数,寄存器是它的一些输入数据、返回数据,以及一些状态。中断有点像linux编程中的信号(signal),当设备有数据可读,可以接收数据,状态发生变化等等时,可以(当然,也可以不)产生一个中断,打断内核的执行(CPU硬件上的打断,不是操作系统的调度),跳转到中断处理函数(类似于信号处理函数,信号和信号处理函数对应,中断号和中断处理函数对应)。具体的跳转方式如下:使用内核中的函数request_irq申请中断时,填写中断号和中断函数。内存中有一张表(数组),叫做中断向量表,以中断号为key,以中断函数的地址为value,记录了中断函数的信息。当中断发生时,CPU可以得知中断号,然后通过中断向量表查找到对应的中断处理函数,然后跳转过去执行。真正的中断函数是没有输入参数和返回值,内核中提供的中断函数是经过处理的,所以会有int irq, void *dev_id两个参数。虚拟设备的寄存器的地址都很小,可以理解为偏移量,那么base如何获取呢?首先通过platform bus,得到IO内存的虚拟物理地址,然后使用ioremap将虚拟物理地址映射到内核虚拟地址中,然后可以在内核中使用。注意不能直接当成普通的内存来用,需要使用特殊的readb, writeb, readw, writew, readl, writel,因为硬件的寄存器,每次读取,返回的数据可以是不同的;如果要通过寄存器发送一个数组,那么循环对同一个寄存器进行写操作即可,寄存器地址不用++;另外对于读取和写入的顺序以及操作的宽度(8bit, 16bit or 32bit)也有严格的要求,不是随便来的。如果当成普通内存访问,那么编译器可能会去使用缓存,CPU执行指令可能乱序,以及宽度不对,都会导致硬件工作不正常,所以不能当成普通内存指针去使用。
battery驱动代码在goldfish目录中的drivers/misc/qemupipe/qemu_pipe.c,为了简单起见,注销,关闭,清理的代码就不详细说明了。
驱动的初始化函数是:
static struct platform_driver goldfish_battery_device = { .probe = goldfish_battery_probe, .remove = goldfish_battery_remove, .driver = { .name = "goldfish-battery" } }; static int __init goldfish_battery_init(void) { return platform_driver_register(&goldfish_battery_device); }注册了一个名为goldfish-battery的总线设备,它的probe函数为goldfish_battery_probe,在安装battery驱动,或者总线上有新的设备时会被调用,去匹配驱动程序和设备(根据驱动的名字和设备的名字匹配)。
goldfish_battery_probe先是对goldfish_battery_data结构体进行初始化,然后使用platform_get_resource去获取设备的IO内存资源,对IORESOURCE_MEM资源进行ioremap,然后将base保存到data->reg_base中;然后使用platform_get_irq获取中断号,并保存到data->irq中并使用request_irq函数注册了中断函数goldfish_battery_interrupt。
data->battery和data->ac都是struct power_supply,比如battery:
data->battery.properties = goldfish_battery_props; data->battery.num_properties = ARRAY_SIZE(goldfish_battery_props); data->battery.get_property = goldfish_battery_get_property; data->battery.name = "battery"; data->battery.type = POWER_SUPPLY_TYPE_BATTERY;
会有一些属性名,属性个数,读取属性的函数等信息,power_supply_register之后,在guest os的/sys/class/power_supply/battery中会有一些文件,文件名都和属性名对应,比如capacity,health,status等,读函数也就是刚才的goldfish_battery_get_property,写函数没有。guest os用户空间的程序,直接读取这些属性文件,属性文件的内容,都来自于对寄存器的读取,比如
static int goldfish_battery_get_property(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { struct goldfish_battery_data *data = container_of(psy, struct goldfish_battery_data, battery); int ret = 0; switch (psp) { case POWER_SUPPLY_PROP_STATUS: val->intval = GOLDFISH_BATTERY_READ(data, BATTERY_STATUS); break; case POWER_SUPPLY_PROP_HEALTH: val->intval = GOLDFISH_BATTERY_READ(data, BATTERY_HEALTH); break; case POWER_SUPPLY_PROP_PRESENT: val->intval = GOLDFISH_BATTERY_READ(data, BATTERY_PRESENT); break; case POWER_SUPPLY_PROP_TECHNOLOGY: val->intval = POWER_SUPPLY_TECHNOLOGY_LION; break; case POWER_SUPPLY_PROP_CAPACITY: val->intval = GOLDFISH_BATTERY_READ(data, BATTERY_CAPACITY); break; default: ret = -EINVAL; break; } return ret; }
这样就可得到虚拟设备battery的信息。
最后,GOLDFISH_BATTERY_WRITE(data, BATTERY_INT_ENABLE, BATTERY_INT_MASK)写BATTERY_INT_MASK到寄存器BATTERY_INT_ENABLE使能了中断。当battery以及ac的状态发生变化时,虚拟设备将产生中断(这部分代码在emulator中),然后我们的中断函数goldfish_battery_interrupt就会被调用了。
完整的goldfish_battery_probe代码如下:
static int goldfish_battery_probe(struct platform_device *pdev) { int ret; struct resource *r; struct goldfish_battery_data *data; data = kzalloc(sizeof(*data), GFP_KERNEL); if (data == NULL) { ret = -ENOMEM; goto err_data_alloc_failed; } spin_lock_init(&data->lock); data->battery.properties = goldfish_battery_props; data->battery.num_properties = ARRAY_SIZE(goldfish_battery_props); data->battery.get_property = goldfish_battery_get_property; data->battery.name = "battery"; data->battery.type = POWER_SUPPLY_TYPE_BATTERY; data->ac.properties = goldfish_ac_props; data->ac.num_properties = ARRAY_SIZE(goldfish_ac_props); data->ac.get_property = goldfish_ac_get_property; data->ac.name = "ac"; data->ac.type = POWER_SUPPLY_TYPE_MAINS; r = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (r == NULL) { printk(KERN_ERR "%s: platform_get_resource failed\n", pdev->name); ret = -ENODEV; goto err_no_io_base; } #if defined(CONFIG_ARM) data->reg_base = (void __iomem *)IO_ADDRESS(r->start - IO_START); #elif defined(CONFIG_X86) || defined(CONFIG_MIPS) data->reg_base = ioremap(r->start, r->end - r->start + 1); #else #error NOT SUPPORTED #endif data->irq = platform_get_irq(pdev, 0); if (data->irq < 0) { printk(KERN_ERR "%s: platform_get_irq failed\n", pdev->name); ret = -ENODEV; goto err_no_irq; } ret = request_irq(data->irq, goldfish_battery_interrupt, IRQF_SHARED, pdev->name, data); if (ret) goto err_request_irq_failed; ret = power_supply_register(&pdev->dev, &data->ac); if (ret) goto err_ac_failed; ret = power_supply_register(&pdev->dev, &data->battery); if (ret) goto err_battery_failed; platform_set_drvdata(pdev, data); battery_data = data; GOLDFISH_BATTERY_WRITE(data, BATTERY_INT_ENABLE, BATTERY_INT_MASK); return 0; err_battery_failed: power_supply_unregister(&data->ac); err_ac_failed: free_irq(data->irq, data); err_request_irq_failed: err_no_irq: #if defined(CONFIG_ARM) #elif defined(CONFIG_X86) || defined(CONFIG_MIPS) iounmap(data->reg_base); #else #error NOT SUPPORTED #endif err_no_io_base: kfree(data); err_data_alloc_failed: return ret; }
中断函数goldfish_battery_interrupt,先读取STATUS寄存器,判断是battery的中断事件,还是ac的
/* read status flags, which will clear the interrupt */ status = GOLDFISH_BATTERY_READ(data, BATTERY_INT_STATUS); status &= BATTERY_INT_MASK;
然后调用power_supply_changed去通知内核。
完整的goldfish_battery_interrupt如下:
static irqreturn_t goldfish_battery_interrupt(int irq, void *dev_id) { unsigned long irq_flags; struct goldfish_battery_data *data = dev_id; uint32_t status; spin_lock_irqsave(&data->lock, irq_flags); /* read status flags, which will clear the interrupt */ status = GOLDFISH_BATTERY_READ(data, BATTERY_INT_STATUS); status &= BATTERY_INT_MASK; if (status & BATTERY_STATUS_CHANGED) power_supply_changed(&data->battery); if (status & AC_STATUS_CHANGED) power_supply_changed(&data->ac); spin_unlock_irqrestore(&data->lock, irq_flags); return status ? IRQ_HANDLED : IRQ_NONE; }
需要注意一下struct goldfish_battery_data是如何传递给中断函数和platform_device的。
2.2、platform bus的驱动
在看虚拟设备之前,最好把platform bus的驱动程序也看了arch/x86/mach-goldfish/pdev_bus.c
platform bus的文档:
I. Goldfish platform bus: ========================= The 'platform bus', in Linux kernel speak, is a special device that is capable of enumerating other platform devices found on the system to the kernel. This flexibility allows to customize which virtual devices are available when running a given emulated system configuration. Relevant files: $QEMU/hw/android/goldfish/device.c $KERNEL/arch/arm/mach-goldfish/pdev_bus.c $KERNEL/arch/x86/mach-goldfish/pdev_bus.c $KERNEL/arch/mips/goldfish/pdev_bus.c Device properties: Name: goldfish_device_bus Id: -1 IrqCount: 1 32-bit I/O registers (offset, name, abstract) 0x00 BUS_OP R: Iterate to next device in enumeration. W: Start device enumeration. 0x04 GET_NAME W: Copy device name to kernel memory. 0x08 NAME_LEN R: Read length of current device's name. 0x0c ID R: Read id of current device. 0x10 IO_BASE R: Read I/O base address of current device. 0x14 IO_SIZE R: Read I/O base size of current device. 0x18 IRQ_BASE R: Read base IRQ of current device. 0x1c IRQ_COUNT R: Read IRQ count of current device. # For 64-bit guest architectures only: 0x20 NAME_ADDR_HIGH W: Write high 32-bit of kernel address of name buffer used by GET_NAME. Must be written to before the GET_NAME write. The kernel iterates over the list of current devices with something like: IO_WRITE(BUS_OP, 0); // Start iteration, any value other than 0 is invalid. for (;;) { int ret = IO_READ(BUS_OP); if (ret == 0 /* OP_DONE */) { // no more devices. break; } else if (ret == 8 /* OP_ADD_DEV */) { // Read device properties. Device dev; dev.name_len = IO_READ(NAME_LEN); dev.id = IO_READ(ID); dev.io_base = IO_READ(IO_BASE); dev.io_size = IO_READ(IO_SIZE); dev.irq_base = IO_READ(IRQ_BASE); dev.irq_count = IO_READ(IRQ_COUNT); dev.name = kalloc(dev.name_len + 1); // allocate room for device name. #if 64BIT_GUEST_CPU IO_WRITE(NAME_ADDR_HIGH, (uint32_t)(dev.name >> 32)); #endif IO_WRITE(GET_NAME, (uint32_t)dev.name); // copy to kernel memory. dev.name[dev.name_len] = 0; .. add device to kernel's list. } else { // Not returned by current goldfish implementation. } } The device also uses a single IRQ, which it will raise to indicate to the kernel that new devices are available, or that some of them have been removed. The kernel will then start a new enumeration. The IRQ is lowered by the device only when a IO_READ(BUS_OP) returns 0 (OP_DONE). NOTE: The kernel hard-codes a platform_device definition with the name "goldfish_pdev_bus" for the platform bus (e.g. see $KERNEL/arch/arm/mach-goldfish/board-goldfish.c), however, the bus itself will appear during enumeration as a device named "goldfish_device_bus" The kernel driver for the platform bus only matches the "goldfish_pdev_bus" name, and will ignore any device named "goldfish_device_bus".读取NAME_LEN可以得到bus上一个设备的名字长度,读取IO_BASE可以得到IO内存的起始地址,读取IO_SIZE可以得到IO内存的大小,这些都很容易理解。往GET_NAME寄存器写一个指针,然后设备名称被虚拟bus写入这个指针,也还好。需要注意的是BUS_OP,先写BUS_OP,开始设备的枚举,每次读BUS_OP,如果是PDEV_BUS_OP_ADD_DEV,说明有新的设备,并切换下一个设备,切换之后,再次读取NAME_LEN,IO_BASE,IO_SIZE将返回的下一个设备的信息了;如果是PDEV_BUS_OP_DONE,说明枚举完毕,没有新的设备了。
首先是把和emulator约定好的IO内存和中断号信息提供给内核:
static struct resource goldfish_pdev_bus_resources[] = { { .start = GOLDFISH_PDEV_BUS_BASE, .end = GOLDFISH_PDEV_BUS_BASE + GOLDFISH_PDEV_BUS_END - 1, .flags = IORESOURCE_IO, }, { .start = IRQ_PDEV_BUS, .end = IRQ_PDEV_BUS, .flags = IORESOURCE_IRQ, } }; struct platform_device goldfish_pdev_bus_device = { .name = "goldfish_pdev_bus", .id = -1, .num_resources = ARRAY_SIZE(goldfish_pdev_bus_resources), .resource = goldfish_pdev_bus_resources }; static int __init goldfish_init(void) { return platform_device_register(&goldfish_pdev_bus_device); } device_initcall(goldfish_init);
static struct platform_driver goldfish_pdev_bus_driver = { .probe = goldfish_pdev_bus_probe, .remove = __devexit_p(goldfish_pdev_bus_remove), .driver = { .name = "goldfish_pdev_bus" } }; static int __init goldfish_pdev_bus_init(void) { return platform_driver_register(&goldfish_pdev_bus_driver); } static void __exit goldfish_pdev_bus_exit(void) { platform_driver_unregister(&goldfish_pdev_bus_driver); } module_init(goldfish_pdev_bus_init); module_exit(goldfish_pdev_bus_exit);
goldfish_pdev_bus_probe函数比battery的probe还要简单,就不详细说明了,注意最后往PDEV_BUS_OP写东西,开始设备的模拟(写PDEV_BUS_OP,虚拟设备会触发中断,然后在中断函数里面进行设备的枚举)。
static int __devinit goldfish_pdev_bus_probe(struct platform_device *pdev) { int ret; struct resource *r; r = platform_get_resource(pdev, IORESOURCE_IO, 0); if(r == NULL) return -EINVAL; pdev_bus_base = ioremap(GOLDFISH_IO_START + r->start, GOLDFISH_IO_SIZE); r = platform_get_resource(pdev, IORESOURCE_IRQ, 0); if(r == NULL) return -EINVAL; pdev_bus_irq = r->start; ret = request_irq(pdev_bus_irq, goldfish_pdev_bus_interrupt, IRQF_SHARED, "goldfish_pdev_bus", ); if(ret) goto err_request_irq_failed; writel(PDEV_BUS_OP_INIT, pdev_bus_base + PDEV_BUS_OP); err_request_irq_failed: return ret; }
中断函数goldfish_pdev_bus_interrupt就是不断读取PDEV_BUS_OP,如果返回PDEV_BUS_OP_ADD_DEV,就调用goldfish_new_pdev去添加设备,如果返回PDEV_BUS_OP_DONE就结束。
static irqreturn_t goldfish_pdev_bus_interrupt(int irq, void *dev_id) { irqreturn_t ret = IRQ_NONE; while(1) { uint32_t op = readl(pdev_bus_base + PDEV_BUS_OP); switch(op) { case PDEV_BUS_OP_DONE: return IRQ_NONE; case PDEV_BUS_OP_REMOVE_DEV: goldfish_pdev_remove(); break; case PDEV_BUS_OP_ADD_DEV: goldfish_new_pdev(); break; } ret = IRQ_HANDLED; } }goldfish_new_pdev通过读取寄存器,获得新设备的名称,IO内存,中断号等信息,获取设备的信息后,添加设备结构体到pdev_bus_new_devices链表,这样battery的驱动就可以得到battery设备结构体中的IO内存和中断号的信息了(platform_get_resource)。
最后调用了schedule_work(&pdev_bus_worker)函数。goldfish_pdev_worker是worker,类似于tasklet,注册后会在以后某一时刻运行,而不会占用中断上下文的时间。该函数主要用于更新三个链表,新加设备,已删除设备,已注册设备。
static int goldfish_new_pdev(void) { struct pdev_bus_dev *dev; uint32_t name_len; uint32_t irq = -1, irq_count; int resource_count = 2; uint32_t base; char *name; base = readl(pdev_bus_base + PDEV_BUS_IO_BASE); irq_count = readl(pdev_bus_base + PDEV_BUS_IRQ_COUNT); name_len = readl(pdev_bus_base + PDEV_BUS_NAME_LEN); if(irq_count) resource_count++; dev = kzalloc(sizeof(*dev) + sizeof(struct resource) * resource_count + name_len + 1, GFP_ATOMIC); if(dev == NULL) return -ENOMEM; dev->pdev.num_resources = resource_count; dev->pdev.resource = (struct resource *)(dev + 1); dev->pdev.name = name = (char *)(dev->pdev.resource + resource_count); dev->pdev.dev.coherent_dma_mask = ~0; writel((unsigned long)name, pdev_bus_base + PDEV_BUS_GET_NAME); name[name_len] = '\0'; dev->pdev.id = readl(pdev_bus_base + PDEV_BUS_ID); dev->pdev.resource[0].start = base; dev->pdev.resource[0].end = base + readl(pdev_bus_base + PDEV_BUS_IO_SIZE) - 1; dev->pdev.resource[0].flags = IORESOURCE_MEM; if(irq_count) { irq = readl(pdev_bus_base + PDEV_BUS_IRQ); dev->pdev.resource[1].start = irq; dev->pdev.resource[1].end = irq + irq_count - 1; dev->pdev.resource[1].flags = IORESOURCE_IRQ; } printk("goldfish_new_pdev %s at %x irq %d\n", name, base, irq); list_add_tail(&dev->list, &pdev_bus_new_devices); schedule_work(&pdev_bus_worker); return 0; }
static void goldfish_pdev_worker(struct work_struct *work) { int ret; struct pdev_bus_dev *pos, *n; list_for_each_entry_safe(pos, n, &pdev_bus_removed_devices, list) { list_del(&pos->list); platform_device_unregister(&pos->pdev); kfree(pos); } list_for_each_entry_safe(pos, n, &pdev_bus_new_devices, list) { list_del(&pos->list); ret = platform_device_register(&pos->pdev); if(ret) { printk("goldfish_pdev_worker failed to register device, %s\n", pos->pdev.name); } else { printk("goldfish_pdev_worker registered %s\n", pos->pdev.name); } list_add_tail(&pos->list, &pdev_bus_registered_devices); } }
内核相关的东西结束了,后面的都是emulator虚拟设备的东西了,会比较难以理解,而且没有什么资料。
三、emulator中的虚拟设备
3.1、battery虚拟设备
battery虚拟设备的代码为:http://androidxref.com/5.1.0_r1/xref/external/qemu/hw/android/goldfish/battery.c
首先是虚拟设备的寄存器,定义了寄存器的地址,然后使用结构体goldfish_battery_state保存寄存器的信息,当对这个结构体读写时,就是读写寄存器,用来模拟寄存器。
enum { /* status register */ BATTERY_INT_STATUS = 0x00, /* set this to enable IRQ */ BATTERY_INT_ENABLE = 0x04, BATTERY_AC_ONLINE = 0x08, BATTERY_STATUS = 0x0C, BATTERY_HEALTH = 0x10, BATTERY_PRESENT = 0x14, BATTERY_CAPACITY = 0x18, BATTERY_STATUS_CHANGED = 1U << 0, AC_STATUS_CHANGED = 1U << 1, BATTERY_INT_MASK = BATTERY_STATUS_CHANGED | AC_STATUS_CHANGED, }; struct goldfish_battery_state { struct goldfish_device dev; // IRQs uint32_t int_status; // irq enable mask for int_status uint32_t int_enable; int ac_online; int status; int health; int present; int capacity; // the fields below are part of the device configuration // and don't need to be saved to / restored from snapshots. int hw_has_battery; };
/* update this each time you update the battery_state struct */ #define BATTERY_STATE_SAVE_VERSION 1 #define QFIELD_STRUCT struct goldfish_battery_state QFIELD_BEGIN(goldfish_battery_fields) QFIELD_INT32(int_status), QFIELD_INT32(int_enable), QFIELD_INT32(ac_online), QFIELD_INT32(status), QFIELD_INT32(health), QFIELD_INT32(present), QFIELD_INT32(capacity), QFIELD_END static void goldfish_battery_save(QEMUFile* f, void* opaque) { struct goldfish_battery_state* s = opaque; qemu_put_struct(f, goldfish_battery_fields, s); } static int goldfish_battery_load(QEMUFile* f, void* opaque, int version_id) { struct goldfish_battery_state* s = opaque; if (version_id != BATTERY_STATE_SAVE_VERSION) return -1; return qemu_get_struct(f, goldfish_battery_fields, s); }虚拟设备的初始化函数是goldfish_battery_init,往寄存器结构体里面塞了一些默认值,比如名字,电量什么的。最后调用了goldfish_device_add去添加设备到bus,这个函数非常关键,动态分配了每个设备的IO内存空间,以及中断号,设置了对应IO内存的读写函数数组以及寄存器结构体,后面将详细说明。
void goldfish_battery_init(int has_battery) { struct goldfish_battery_state *s; s = (struct goldfish_battery_state *)g_malloc0(sizeof(*s)); s->dev.name = "goldfish-battery"; s->dev.base = 0; // will be allocated dynamically s->dev.size = 0x1000; s->dev.irq_count = 1; // default values for the battery s->ac_online = 1; s->hw_has_battery = has_battery; if (has_battery) { s->status = POWER_SUPPLY_STATUS_CHARGING; s->health = POWER_SUPPLY_HEALTH_GOOD; s->present = 1; // battery is present s->capacity = 50; // 50% charged } else { s->status = POWER_SUPPLY_STATUS_NOT_CHARGING; s->health = POWER_SUPPLY_HEALTH_DEAD; s->present = 0; s->capacity = 0; } battery_state = s; goldfish_device_add(&s->dev, goldfish_battery_readfn, goldfish_battery_writefn, s); register_savevm(NULL, "battery_state", 0, BATTERY_STATE_SAVE_VERSION, goldfish_battery_save, goldfish_battery_load, s); }goldfish_battery_read和goldfish_battery_write是虚拟设备的寄存器的读写函数,给定寄存器结构体,以及寄存器(就是偏移量),去模拟寄存器的读写。
注意读BATTERY_INT_STATUS之后,如果有中断标志位,则清空,因为程序已经读到了有新的中断事件,没必要再去触发一次中断了。
static uint32_t goldfish_battery_read(void *opaque, hwaddr offset) { uint32_t ret; struct goldfish_battery_state *s = opaque; switch(offset) { case BATTERY_INT_STATUS: // return current buffer status flags ret = s->int_status & s->int_enable; if (ret) { goldfish_device_set_irq(&s->dev, 0, 0); s->int_status = 0; } return ret; case BATTERY_INT_ENABLE: return s->int_enable; case BATTERY_AC_ONLINE: return s->ac_online; case BATTERY_STATUS: return s->status; case BATTERY_HEALTH: return s->health; case BATTERY_PRESENT: return s->present; case BATTERY_CAPACITY: return s->capacity; default: cpu_abort (cpu_single_env, "goldfish_battery_read: Bad offset %x\n", offset); return 0; } } static void goldfish_battery_write(void *opaque, hwaddr offset, uint32_t val) { struct goldfish_battery_state *s = opaque; switch(offset) { case BATTERY_INT_ENABLE: /* enable interrupts */ s->int_enable = val; // s->int_status = (AUDIO_INT_WRITE_BUFFER_1_EMPTY | AUDIO_INT_WRITE_BUFFER_2_EMPTY); // goldfish_device_set_irq(&s->dev, 0, (s->int_status & s->int_enable)); break; default: cpu_abort (cpu_single_env, "goldfish_audio_write: Bad offset %x\n", offset); } }读写函数有三组,分别对应8bit,16bit,32bit的宽度去读写,会在goldfish_device_add时指定这两个读写函数数组。
static CPUReadMemoryFunc *goldfish_battery_readfn[] = { goldfish_battery_read, goldfish_battery_read, goldfish_battery_read }; static CPUWriteMemoryFunc *goldfish_battery_writefn[] = { goldfish_battery_write, goldfish_battery_write, goldfish_battery_write };
3.2、platform bus虚拟设备
platform bus虚拟设备的代码是:http://androidxref.com/5.1.0_r1/xref/external/qemu/hw/android/goldfish/device.c
注意platform bus本身也是一个设备,也在设备链表中。
初始化相关的代码,goldfish_device_init和goldfish_device_bus_init中指定的base, size, irq, irq_count是固定写死的,和内核中的代码对应。
static struct bus_state bus_state = { .dev = { .name = "goldfish_device_bus", .id = -1, .base = 0x10001000, .size = 0x1000, .irq = 1, .irq_count = 1, } }; void goldfish_device_init(qemu_irq *pic, uint32_t base, uint32_t size, uint32_t irq, uint32_t irq_count) { goldfish_pic = pic; goldfish_free_base = base; goldfish_free_irq = irq; } int goldfish_device_bus_init(uint32_t base, uint32_t irq) { bus_state.dev.base = base; bus_state.dev.irq = irq; return goldfish_device_add(&bus_state.dev, goldfish_bus_readfn, goldfish_bus_writefn, &bus_state); }写寄存器的函数是goldfish_bus_write,如果是写PDEV_BUS_OP_INIT,那么调用goldfish_bus_op_init函数,如果设备链表非空,将产生一个中断事件,内核代码中的中断函数将得到执行,去进行platform bus驱动中所说的设备的枚举。其他的没什么特别的。
static void goldfish_bus_write(void *opaque, hwaddr offset, uint32_t value) { struct bus_state *s = (struct bus_state *)opaque; switch(offset) { case PDEV_BUS_OP: switch(value) { case PDEV_BUS_OP_INIT: goldfish_bus_op_init(s); break; default: cpu_abort (cpu_single_env, "goldfish_bus_write: Bad PDEV_BUS_OP value %x\n", value); }; break; case PDEV_BUS_GET_NAME: if(s->current) { target_ulong name = (target_ulong)(s->name_addr_high | value); safe_memory_rw_debug(current_cpu, name, (void*)s->current->name, strlen(s->current->name), 1); } break; case PDEV_BUS_NAME_ADDR_HIGH: s->name_addr_high = ((uint64_t)value << 32); goldfish_64bit_guest = 1; break; default: cpu_abort (cpu_single_env, "goldfish_bus_write: Bad offset %x\n", offset); } } static void goldfish_bus_op_init(struct bus_state *s) { struct goldfish_device *dev = first_device; while(dev) { dev->reported_state = 0; dev = dev->next; } s->current = NULL; goldfish_device_set_irq(&s->dev, 0, first_device != NULL); }
读寄存器的函数是goldfish_bus_read,每次读取PDEV_BUS_OP,都会迭代一个新的设备,返回值说明是否有新的设备,其他的没什么特别的。
static uint32_t goldfish_bus_read(void *opaque, hwaddr offset) { struct bus_state *s = (struct bus_state *)opaque; switch (offset) { case PDEV_BUS_OP: if(s->current) { s->current->reported_state = 1; s->current = s->current->next; } else { s->current = first_device; } while(s->current && s->current->reported_state == 1) s->current = s->current->next; if(s->current) return PDEV_BUS_OP_ADD_DEV; else { goldfish_device_set_irq(&s->dev, 0, 0); return PDEV_BUS_OP_DONE; } case PDEV_BUS_NAME_LEN: return s->current ? strlen(s->current->name) : 0; case PDEV_BUS_ID: return s->current ? s->current->id : 0; case PDEV_BUS_IO_BASE: return s->current ? s->current->base : 0; case PDEV_BUS_IO_SIZE: return s->current ? s->current->size : 0; case PDEV_BUS_IRQ: return s->current ? s->current->irq : 0; case PDEV_BUS_IRQ_COUNT: return s->current ? s->current->irq_count : 0; default: cpu_abort (cpu_single_env, "goldfish_bus_read: Bad offset %x\n", offset); return 0; } }关于触发中断的函数void goldfish_device_set_irq(struct goldfish_device *dev, int irq, int level)需要详细说明一下。
x86使用的经典的中断控制器是8258A(文档),在emulator中,使用的是一个虚拟的8259A(代码),并没有使用电脑上的8259A,因为硬件的8259A,emulator无法去触发它的中断请求。中断相关的初始化代码为:http://androidxref.com/5.1.0_r1/xref/external/qemu/hw/i386/pc.c#1031
最多有15个虚拟中断,两片8259A级连,从片接在主片的IRQ2上(IRQ from 0 to 7 for every chip)。
dev是具体的虚拟设备;irq是每一个虚拟设备的中断的序号,如果虚拟设备只有一个中断,那么这里的irq就是0,如果有两个,那么可以是0或者1,这里的irq并不是系统中所有中断的序号;level为1的话产生中断,为0取消中断(不是禁止中断,仅仅是取消中断请求)。goldfish_device_set_irq调用qemu_set_irq函数,最终会设置虚拟8259A中IRR(中断请求寄存器)寄存器上与设置虚拟设备的中断号所对应的位(http://androidxref.com/5.1.0_r1/xref/external/qemu/hw/intc/i8259.c#83)去触发中断事件,然后内核代码中的中断函数将得到执行(触发中断之后,CPU得到中断号,查找中断向量表,跳转到中断处理函数去执行)。
void goldfish_device_set_irq(struct goldfish_device *dev, int irq, int level) { if(irq >= dev->irq_count) cpu_abort (cpu_single_env, "goldfish_device_set_irq: Bad irq %d >= %d\n", irq, dev->irq_count); else qemu_set_irq(goldfish_pic[dev->irq + irq], level); }
3.3、虚拟设备的灵魂goldfish_device_add
goldfish_device_add放在最后,因为这是一个最最重要的函数,可以解答内核对虚拟设备的寄存器进行读写时,emulator怎么知道是哪一个虚拟设备被访问了,哪一个虚拟寄存器被访问了,应该怎么模拟这个虚拟寄存器的读写。这么重要的函数,当然只有几行,调用了其他的函数。这里先简要说明下,goldfish_add_device_no_io是根据目前空闲的IO内存地址和中断号,去给新的设备分配IO内存和中断号的(如果base or irq不等于0,说明静态分配好了);cpu_register_io_memory维护了三个数组,分别是三个读函数的数组,三个写函数的数组,虚拟设备寄存器结构体的数组,数组下标为io_index,是动态分配的,注意有几个io_index是保留的;cpu_register_physical_memory分配虚拟物理内存页,并将io_index<<3|subwidth保存在了页面信息PhysPageDesc结构体的phys_offset中。
int goldfish_device_add(struct goldfish_device *dev, CPUReadMemoryFunc **mem_read, CPUWriteMemoryFunc **mem_write, void *opaque) { int iomemtype; goldfish_add_device_no_io(dev); iomemtype = cpu_register_io_memory(mem_read, mem_write, opaque); cpu_register_physical_memory(dev->base, dev->size, iomemtype); return 0; }
动态分配虚拟设备的IO内存和中断号的函数为goldfish_add_device_no_io,注意x86上有几个中断号是保留的。
int goldfish_add_device_no_io(struct goldfish_device *dev) { if(dev->base == 0) { dev->base = goldfish_free_base; goldfish_free_base += dev->size; } if(dev->irq == 0 && dev->irq_count > 0) { dev->irq = goldfish_free_irq; goldfish_free_irq += dev->irq_count; #ifdef TARGET_I386 /* Make sure that we pass by the reserved IRQs. */ while (goldfish_free_irq == GFD_KBD_IRQ || goldfish_free_irq == GFD_RTC_IRQ || goldfish_free_irq == GFD_MOUSE_IRQ || goldfish_free_irq == GFD_ERR_IRQ) { goldfish_free_irq++; } #endif if (goldfish_free_irq >= GFD_MAX_IRQ) { derror("Goldfish device has exceeded available IRQ number."); exit(1); } } //printf("goldfish_add_device: %s, base %x %x, irq %d %d\n", // dev->name, dev->base, dev->size, dev->irq, dev->irq_count); dev->next = NULL; if(last_device) { last_device->next = dev; } else { first_device = dev; } last_device = dev; return 0; }
折腾三个数组的函数是cpu_register_io_memory,注意io_index是动态分配的,每一个虚拟设备对应一个io_index,通过io_index可以找到这个虚拟设备的三个读写函数和寄存器结构体。注意io_index的最大值是IO_MEM_NB_ENTRIES:
/* MMIO pages are identified by a combination of an IO device index and 3 flags. The ROMD code stores the page ram offset in iotlb entry, so only a limited number of ids are avaiable. */ #define IO_MEM_NB_ENTRIES (1 << (TARGET_PAGE_BITS - IO_MEM_SHIFT))
函数的返回值是io_index << 3 | subwidth,subwidth标记三个读写函数是否有NULL的。
当得知io_index以及寄存器(偏移量)时,就可以调用虚拟设备自己的读写函数去读写寄存器结构体,进行设备的模拟了。如何在kernel写寄存器时,得知这个io_index呢,下面分析。
/* mem_read and mem_write are arrays of functions containing the function to access byte (index 0), word (index 1) and dword (index 2). Functions can be omitted with a NULL function pointer. If io_index is non zero, the corresponding io zone is modified. If it is zero, a new io zone is allocated. The return value can be used with cpu_register_physical_memory(). (-1) is returned if error. */ static int cpu_register_io_memory_fixed(int io_index, CPUReadMemoryFunc * const *mem_read, CPUWriteMemoryFunc * const *mem_write, void *opaque) { int i, subwidth = 0; if (io_index <= 0) { io_index = get_free_io_mem_idx(); if (io_index == -1) return io_index; } else { io_index >>= IO_MEM_SHIFT; if (io_index >= IO_MEM_NB_ENTRIES) return -1; } for(i = 0;i < 3; i++) { if (!mem_read[i] || !mem_write[i]) subwidth = IO_MEM_SUBWIDTH; _io_mem_read[io_index][i] = mem_read[i]; _io_mem_write[io_index][i] = mem_write[i]; } io_mem_opaque[io_index] = opaque; return (io_index << IO_MEM_SHIFT) | subwidth; } int cpu_register_io_memory(CPUReadMemoryFunc * const *mem_read, CPUWriteMemoryFunc * const *mem_write, void *opaque) { return cpu_register_io_memory_fixed(0, mem_read, mem_write, opaque); }
第三个函数cpu_register_physical_memory分配虚拟物理内存,并且将io_index << 3 | subwidth保存在了PhysPageDesc结构体的phys_offset中了。
物理内存管理的代码很复杂,只需要理解普通的ram是按页分配,并且phys_offset=0,表示是普通ram;IO内存也是按页分配的,并且phys_offset就是刚才的io_index << 3 | subwidth,如果IO内存占了多个页面,那么每个页面的phys_offset是相同的(region_offset不同),可以找到相同的io_index。
下面是几个宏的定义,注意IO_MEM_ROM,IO_MEM_UNASSIGNED,IO_MEM_NOTDIRTY是get_free_io_mem_idx预先保留的几个io_index。
#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS) #define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1) #define IO_MEM_SHIFT 3 #define IO_MEM_RAM (0 << IO_MEM_SHIFT) /* hardcoded offset */ #define IO_MEM_ROM (1 << IO_MEM_SHIFT) /* hardcoded offset */ #define IO_MEM_UNASSIGNED (2 << IO_MEM_SHIFT) #define IO_MEM_NOTDIRTY (3 << IO_MEM_SHIFT) /* Acts like a ROM when read and like a device when written. */ #define IO_MEM_ROMD (1) #define IO_MEM_SUBPAGE (2) #define IO_MEM_SUBWIDTH (4)
static inline void cpu_register_physical_memory(hwaddr start_addr, ram_addr_t size, ram_addr_t phys_offset) { cpu_register_physical_memory_offset(start_addr, size, phys_offset, 0); } static inline void cpu_register_physical_memory_offset(hwaddr start_addr, ram_addr_t size, ram_addr_t phys_offset, ram_addr_t region_offset) { cpu_register_physical_memory_log(start_addr, size, phys_offset, region_offset, false); } void cpu_register_physical_memory_log(hwaddr start_addr, ram_addr_t size, ram_addr_t phys_offset, ram_addr_t region_offset, bool log_dirty) { hwaddr addr, end_addr; PhysPageDesc *p; CPUState *cpu; ram_addr_t orig_size = size; subpage_t *subpage; if (kvm_enabled()) kvm_set_phys_mem(start_addr, size, phys_offset); #ifdef CONFIG_HAX if (hax_enabled()) hax_set_phys_mem(start_addr, size, phys_offset); #endif if (phys_offset == IO_MEM_UNASSIGNED) { region_offset = start_addr; } region_offset &= TARGET_PAGE_MASK; size = (size + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK; end_addr = start_addr + (hwaddr)size; addr = start_addr; do { p = phys_page_find(addr >> TARGET_PAGE_BITS); if (p && p->phys_offset != IO_MEM_UNASSIGNED) { ram_addr_t orig_memory = p->phys_offset; hwaddr start_addr2, end_addr2; int need_subpage = 0; CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2, need_subpage); if (need_subpage) { if (!(orig_memory & IO_MEM_SUBPAGE)) { subpage = subpage_init((addr & TARGET_PAGE_MASK), &p->phys_offset, orig_memory, p->region_offset); } else { subpage = io_mem_opaque[(orig_memory & ~TARGET_PAGE_MASK) >> IO_MEM_SHIFT]; } subpage_register(subpage, start_addr2, end_addr2, phys_offset, region_offset); p->region_offset = 0; } else { p->phys_offset = phys_offset; if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM || (phys_offset & IO_MEM_ROMD)) phys_offset += TARGET_PAGE_SIZE; } } else { p = phys_page_find_alloc(addr >> TARGET_PAGE_BITS, 1); p->phys_offset = phys_offset; p->region_offset = region_offset; if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM || (phys_offset & IO_MEM_ROMD)) { phys_offset += TARGET_PAGE_SIZE; } else { hwaddr start_addr2, end_addr2; int need_subpage = 0; CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2, need_subpage); if (need_subpage) { subpage = subpage_init((addr & TARGET_PAGE_MASK), &p->phys_offset, IO_MEM_UNASSIGNED, addr & TARGET_PAGE_MASK); subpage_register(subpage, start_addr2, end_addr2, phys_offset, region_offset); p->region_offset = 0; } } } region_offset += TARGET_PAGE_SIZE; addr += TARGET_PAGE_SIZE; } while (addr != end_addr); /* since each CPU stores ram addresses in its TLB cache, we must reset the modified entries */ /* XXX: slow ! */ CPU_FOREACH(cpu) { tlb_flush(cpu->env_ptr, 1); } }
如果使用kvm加速的话,当读写MMIO时,会退出:
case KVM_EXIT_MMIO: dprintf("handle_mmio\n"); cpu_physical_memory_rw(run->mmio.phys_addr, run->mmio.data, run->mmio.len, run->mmio.is_write); ret = 1; break;
cpu_physical_memory_rw函数将被调用,先判断是否为MMIO,如果是,获取io_index,然后根据不同的访问宽度(8bit, 16bit, 32bit)去调用io_mem_write(io_index, addr1, val, xxx)和io_mem_read(io_index, addr1, xxx)函数。这两个函数是对cpu_register_io_memory所维护的三个数组的包装。这样,就可以使用寄存器对应的虚拟设备的读写函数和寄存器结构体以及偏移量去模拟寄存器的读写了。
haxm和tcg原理类似。
void cpu_physical_memory_rw(hwaddr addr, void *buf, int len, int is_write) { int l, io_index; uint8_t *ptr; uint32_t val; hwaddr page; ram_addr_t pd; uint8_t* buf8 = (uint8_t*)buf; PhysPageDesc *p; while (len > 0) { page = addr & TARGET_PAGE_MASK; l = (page + TARGET_PAGE_SIZE) - addr; if (l > len) l = len; p = phys_page_find(page >> TARGET_PAGE_BITS); if (!p) { pd = IO_MEM_UNASSIGNED; } else { pd = p->phys_offset; } if (is_write) { if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) { hwaddr addr1 = addr; io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1); if (p) addr1 = (addr & ~TARGET_PAGE_MASK) + p->region_offset; /* XXX: could force cpu_single_env to NULL to avoid potential bugs */ if (l >= 4 && ((addr1 & 3) == 0)) { /* 32 bit write access */ val = ldl_p(buf8); io_mem_write(io_index, addr1, val, 4); l = 4; } else if (l >= 2 && ((addr1 & 1) == 0)) { /* 16 bit write access */ val = lduw_p(buf8); io_mem_write(io_index, addr1, val, 2); l = 2; } else { /* 8 bit write access */ val = ldub_p(buf8); io_mem_write(io_index, addr1, val, 1); l = 1; } } else { ram_addr_t addr1; addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK); /* RAM case */ ptr = qemu_get_ram_ptr(addr1); memcpy(ptr, buf8, l); invalidate_and_set_dirty(addr1, l); } } else { if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM && !(pd & IO_MEM_ROMD)) { hwaddr addr1 = addr; /* I/O case */ io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1); if (p) addr1 = (addr & ~TARGET_PAGE_MASK) + p->region_offset; if (l >= 4 && ((addr1 & 3) == 0)) { /* 32 bit read access */ val = io_mem_read(io_index, addr1, 4); stl_p(buf8, val); l = 4; } else if (l >= 2 && ((addr1 & 1) == 0)) { /* 16 bit read access */ val = io_mem_read(io_index, addr1, 2); stw_p(buf8, val); l = 2; } else { /* 8 bit read access */ val = io_mem_read(io_index, addr1, 1); stb_p(buf8, val); l = 1; } } else { /* RAM case */ ptr = qemu_get_ram_ptr(pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK); memcpy(buf8, ptr, l); } } len -= l; buf8 += l; addr += l; } }
参考资料:
驱动程序的编写可以看:LINUX设备驱动程序(第3版)
硬件的知识,可以看看郭天祥51单片机的视频
别看谭浩强