ubuntu12.04下使用android emulator,启用kvm加速,模拟i8259中断控制器的代码比较旧,对应于qemu0.14或者之前的版本,这时还没有QOM(qemu object model)模型,虚拟设备的代码是比较简单的。
玩虚拟设备之前,首先得搞明白真实设备怎么玩:http://www.360doc.com/content/09/1017/08/128139_7395798.shtml和http://blog.csdn.net/duguteng/article/details/7552774
8259主片的IRQ0~7对应INT 8~INT F,从片的IRQ8~IRQ15对应INT 70~INT 77。
建议先看《android qemu-kvm i8254 pit虚拟设备》
初始化
8259是在pc_init1中初始化的,cpu_irq是8259的parent_irq:
cpu_irq = qemu_allocate_irqs(pic_irq_request, NULL, 1); i8259 = i8259_init(cpu_irq[0]);
qemu_allocate_irqs用来申请并设置qemu_irq结构体的:
qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) { qemu_irq *s; struct IRQState *p; int i; s = (qemu_irq *)g_malloc0(sizeof(qemu_irq) * n); p = (struct IRQState *)g_malloc0(sizeof(struct IRQState) * n); for (i = 0; i < n; i++) { p->handler = handler; p->opaque = opaque; p->n = i; s[i] = p; p++; } return s; }
elcr地址分别为0x4d0,0x4d1,分别对应两片8259,每一位对应一个irq,是控制边沿触发还是电平触发的,置1时是电平触发,elcr_mask是因为有些irq不支持电平触发,所以需要mask。
<a target=_blank href="https://en.wikipedia.org/wiki/Intel_8259">Edge and level triggered modes</a>[edit] Since the ISA bus does not support level triggered interrupts, level triggered mode may not be used for interrupts connected to ISA devices. This means that on PC/XT, PC/AT, and compatible systems the 8259 must be programmed for edge triggered mode. On MCA systems, devices use level triggered interrupts and the interrupt controller is hardwired to always work in level triggered mode. On newer EISA, PCI, and later systems the Edge/Level Control Registers (ELCRs) control the mode per IRQ line, effectively making the mode of the 8259 irrelevant for such systems with ISA buses. The ELCR is programmed by the BIOS at system startup for correct operation. The ELCRs are located 0x4d0 and 0x4d1 in the x86 I/O address space. They are 8-bits wide, each bit corresponding to an IRQ from the 8259s. When a bit is set, the IRQ is in level triggered mode; otherwise, the IRQ is in edge triggered mode.
最后申请了GFD_MAX_IRQ个,也就是16个qemu_irq结构体:
qemu_irq *i8259_init(qemu_irq parent_irq) { PicState2 *s; s = g_malloc0(sizeof(PicState2)); pic_init1(0x20, 0x4d0, &s->pics[0]); pic_init1(0xa0, 0x4d1, &s->pics[1]); s->pics[0].elcr_mask = 0xf8; s->pics[1].elcr_mask = 0xde; s->parent_irq = parent_irq; s->pics[0].pics_state = s; s->pics[1].pics_state = s; isa_pic = s; return qemu_allocate_irqs(i8259_set_irq, s, GFD_MAX_IRQ); }
struct PicState2 { /* 0 is master pic, 1 is slave pic */ /* XXX: better separation between the two pics */ PicState pics[2]; qemu_irq parent_irq; void *irq_request_opaque; /* IOAPIC callback support */ SetIRQFunc *alt_irq_func; void *alt_irq_opaque; };
typedef struct PicState { uint8_t last_irr; /* edge detection */ uint8_t irr; /* interrupt request register */ uint8_t imr; /* interrupt mask register */ uint8_t isr; /* interrupt service register */ uint8_t priority_add; /* highest irq priority */ uint8_t irq_base; uint8_t read_reg_select; uint8_t poll; uint8_t special_mask; uint8_t init_state; uint8_t auto_eoi; uint8_t rotate_on_auto_eoi; uint8_t special_fully_nested_mode; uint8_t init4; /* true if 4 byte init */ uint8_t single_mode; /* true if slave pic is not initialized */ uint8_t elcr; /* PIIX edge/trigger selection*/ uint8_t elcr_mask; PicState2 *pics_state; } PicState;
pic_init1用来真正初始化每一片8259的,绑定了寄存器和读写函数,qemu_register_reset把寄存器的复位函数放到链表里:
static void pic_init1(int io_addr, int elcr_addr, PicState *s) { register_ioport_write(io_addr, 2, 1, pic_ioport_write, s); register_ioport_read(io_addr, 2, 1, pic_ioport_read, s); if (elcr_addr >= 0) { register_ioport_write(elcr_addr, 1, 1, elcr_ioport_write, s); register_ioport_read(elcr_addr, 1, 1, elcr_ioport_read, s); } register_savevm(NULL, "i8259", io_addr, 1, pic_save, pic_load, s); qemu_register_reset(pic_reset, 0, s); }
elcr的读写
elcr的读写函数非常简单,稍微注意下mask的使用就行了:static void elcr_ioport_write(void *opaque, uint32_t addr, uint32_t val) { PicState *s = opaque; s->elcr = val & s->elcr_mask; } static uint32_t elcr_ioport_read(void *opaque, uint32_t addr1) { PicState *s = opaque; return s->elcr; }
8259寄存器的读写
pic_ioport_write
8259的写函数为pic_ioport_write,因为每片就两个寄存器,addr不是0就是1,所以addr &= 1。
ICW1地址为0,ICW2~4地址为1;OCW2~3地址为0,OCW1地址为1。
需要注意地址的复用如何处理,ICW的指令是用于初始化的,先往地址0写ICW1,然后往地址1写剩下的几个ICW指令,具体写的是哪个,由状态机init_state来确定。
初始化完毕后才可以写入OCW指令。
ICW1,OCW2,OCW3复用地址0,是根据val中的特殊位来区分的。初始化完毕后,地址1仅仅对应OCW1。
OCW2需要详细说明下:
1、中断优先级:每片8259由irq0~irq7共计8个中断输入,默认情况下irq0优先级最高,irq7优先级最低,同时发生中断请求时,优先级高的先处理,在嵌套模式下,优先级高的还可以打断优先级低的中断服务程序的执行。
2、循环优先级:这次优先级最高的是0,下一次中断时,优先机最高的轮到1,然后轮到2......,到7,然后再到0。
3、SL用来设置一个偏移量的,加上这个偏移并对8取模后再比较优先级。
static void pic_ioport_write(void *opaque, uint32_t addr, uint32_t val) { PicState *s = opaque; int priority, cmd, irq; #ifdef DEBUG_PIC printf("pic_write: addr=0x%02x val=0x%02x\n", addr, val); #endif addr &= 1; if (addr == 0) { if (val & 0x10) { //ICW1 /* init */ pic_reset(s); /* deassert a pending interrupt */ qemu_irq_lower(s->pics_state->parent_irq); s->init_state = 1; s->init4 = val & 1; //IC4,是否有ICW4 s->single_mode = val & 2; //SNGL,单片还是级连 if (val & 0x08) //只支持边沿触发 hw_error("level sensitive irq not supported"); } else if (val & 0x08) { // OCW3 if (val & 0x04) s->poll = 1; // 查询中断状体寄存器 if (val & 0x02) s->read_reg_select = val & 1; // 读取IRR还是ISR if (val & 0x40) s->special_mask = (val >> 5) & 1; // 特殊屏蔽 } else { //OCW2,中断方式设置,是否自动清除标志位,是否自动循环等 cmd = val >> 5; switch(cmd) { case 0: case 4: // 是否自动循环 s->rotate_on_auto_eoi = cmd >> 2; break; case 1: /* end of interrupt */ case 5: // 需要在中断函数中清除中断标志位 priority = get_priority(s, s->isr); if (priority != 8) { // 有中断 irq = (priority + s->priority_add) & 7; // 根据优先级,计算irq s->isr &= ~(1 << irq); // 清除irq对应的位 if (cmd == 5) // 如果是自动循环的话,需要对priority_add进行加1 s->priority_add = (irq + 1) & 7; pic_update_irq(s->pics_state); // 更新中断 } break; case 3: irq = val & 7; s->isr &= ~(1 << irq); pic_update_irq(s->pics_state); break; case 6: s->priority_add = (val + 1) & 7; // 指定最优先的irq是谁 pic_update_irq(s->pics_state); break; case 7: irq = val & 7; s->isr &= ~(1 << irq); s->priority_add = (irq + 1) & 7; // 优先级自动循环 pic_update_irq(s->pics_state); break; default: /* no operation */ break; } } } else { switch(s->init_state) { case 0: // OCW1,中断屏蔽位 /* normal mode */ s->imr = val; pic_update_irq(s->pics_state); break; case 1: // ICW2 s->irq_base = val & 0xf8; //设置中断型号,也就是irq和中断向量表的映射 s->init_state = s->single_mode ? (s->init4 ? 3 : 0) : 2; //状态机的切换 break; case 2: // ICW3 if (s->init4) { s->init_state = 3; } else { s->init_state = 0; } break; case 3: // ICW4 s->special_fully_nested_mode = (val >> 4) & 1; s->auto_eoi = (val >> 1) & 1; s->init_state = 0; // 初始化结束 break; } } }
没有中断时返回8。
priority_add综合了自动循环和SL设置的东西的因素,对优先级进行调整。
/* return the highest priority found in mask (highest = smallest number). Return 8 if no irq */ static inline int get_priority(PicState *s, int mask) { int priority; if (mask == 0) return 8; priority = 0; while ((mask & (1 << ((priority + s->priority_add) & 7))) == 0) priority++; return priority; }
void pic_update_irq(PicState2 *s) { int irq2, irq; /* first look at slave pic */ irq2 = pic_get_irq(&s->pics[1]); if (irq2 >= 0) { /* if irq request by slave pic, signal master PIC */ pic_set_irq1(&s->pics[0], 2, 1); // slave 8259接在master的irq2上,模拟一个边沿触发master的irq2 pic_set_irq1(&s->pics[0], 2, 0); } /* look at requested irq */ irq = pic_get_irq(&s->pics[0]); if (irq >= 0) { qemu_irq_raise(s->parent_irq); } /* all targets should do this rather than acking the IRQ in the cpu */ #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA) else { qemu_irq_lower(s->parent_irq); } #endif }
irr中断请求,isr中断服务。经过irr后才能到isr。irr表示请求中断,isr表示正在处理的中断。
static int pic_get_irq(PicState *s) { int mask, cur_priority, priority; mask = s->irr & ~s->imr; priority = get_priority(s, mask); //获得irr中优先级最高的 if (priority == 8) return -1; /* compute current priority. If special fully nested mode on the master, the IRQ coming from the slave is not taken into account for the priority computation. */ mask = s->isr; if (s->special_mask) // in OCW3 mask &= ~s->imr; if (s->special_fully_nested_mode && s == &s->pics_state->pics[0]) mask &= ~(1 << 2); cur_priority = get_priority(s, mask); if (priority < cur_priority) { // irr中最优先的比isr中最优先的小,也就是更优先 /* higher priority found: an irq should be generated */ return (priority + s->priority_add) & 7; } else { return -1; } }
static inline void pic_set_irq1(PicState *s, int irq, int level) { int mask; mask = 1 << irq; if (s->elcr & mask) { /* level triggered */ if (level) { s->irr |= mask; s->last_irr |= mask; } else { s->irr &= ~mask; s->last_irr &= ~mask; } } else { /* edge triggered */ if (level) { if ((s->last_irr & mask) == 0) s->irr |= mask; s->last_irr |= mask; } else { s->last_irr &= ~mask; } } }
pic_ioport_read
8259的读函数为pic_ioport_readstatic uint32_t pic_ioport_read(void *opaque, uint32_t addr1) { PicState *s = opaque; unsigned int addr; int ret; addr = addr1; addr &= 1; if (s->poll) { ret = pic_poll_read(s, addr1); s->poll = 0; } else { if (addr == 0) { if (s->read_reg_select) ret = s->isr; else ret = s->irr; } else { ret = s->imr; } } #ifdef DEBUG_PIC printf("pic_read: addr=0x%02x val=0x%02x\n", addr1, ret); #endif return ret; }
static uint32_t pic_poll_read (PicState *s, uint32_t addr1) { int ret; ret = pic_get_irq(s); if (ret >= 0) { if (addr1 >> 7) { // 从片地址第7位为1,0xa0 s->pics_state->pics[0].isr &= ~(1 << 2); s->pics_state->pics[0].irr &= ~(1 << 2); } s->irr &= ~(1 << ret); s->isr &= ~(1 << ret); if (addr1 >> 7 || ret != 2) pic_update_irq(s->pics_state); } else { ret = 0x07; pic_update_irq(s->pics_state); } return ret; }
如何通知CPU来了一个中断?
qemu_set_irq用来设置中断请求,会调用申请qemu_irq时设置的handler函数,对于cpu_irq来说,handler是pic_irq_request;对于8259来说,handler是i8259_set_irq
void qemu_set_irq(qemu_irq irq, int level) { if (!irq) return; irq->handler(irq->opaque, irq->n, level); }
pic_irq_request会设置cpu->interrupt_request |= CPU_INTERRUPT_HARD。
i8259_set_irq最终也会调用到pic_irq_request函数。
static void pic_irq_request(void *opaque, int irq, int level) { CPUState *cpu = first_cpu; CPUArchState *env = cpu->env_ptr; if (env->apic_state) { while (cpu) { if (apic_accept_pic_intr(env)) apic_deliver_pic_intr(env, level); cpu = QTAILQ_NEXT(cpu, node); env = cpu ? cpu->env_ptr : NULL; } } else { if (level) cpu_interrupt(cpu, CPU_INTERRUPT_HARD); else cpu_reset_interrupt(cpu, CPU_INTERRUPT_HARD); } }
void cpu_interrupt(CPUState *cpu, int mask) { CPUArchState *env = cpu->env_ptr; int old_mask; old_mask = cpu->interrupt_request; cpu->interrupt_request |= mask; /* * If called from iothread context, wake the target cpu in * case its halted. */ if (!qemu_cpu_is_self(cpu)) { qemu_cpu_kick(cpu); return; } if (use_icount) { env->icount_decr.u16.high = 0xffff; if (!can_do_io(env) && (mask & ~old_mask) != 0) { cpu_abort(env, "Raised interrupt while not in I/O function"); } } else { cpu->tcg_exit_req = 1; } }
cpu_interrupt注入的中断,会在kvm_arch_pre_run中进行处理。根据cpu->interrupt_request的设置,会调用kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr):
int kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) { CPUX86State *env = cpu->env_ptr; /* Try to inject an interrupt if the guest can accept it */ if (run->ready_for_interrupt_injection && (cpu->interrupt_request & CPU_INTERRUPT_HARD) && (env->eflags & IF_MASK)) { int irq; cpu->interrupt_request &= ~CPU_INTERRUPT_HARD; irq = cpu_get_pic_interrupt(env); if (irq >= 0) { struct kvm_interrupt intr; intr.irq = irq; /* FIXME: errors */ dprintf("injected interrupt %d\n", irq); kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr); } } /* If we have an interrupt but the guest is not ready to receive an * interrupt, request an interrupt window exit. This will * cause a return to userspace as soon as the guest is ready to * receive interrupts. */ if ((cpu->interrupt_request & CPU_INTERRUPT_HARD)) run->request_interrupt_window = 1; else run->request_interrupt_window = 0; dprintf("setting tpr\n"); run->cr8 = cpu_get_apic_tpr(env); #ifdef CONFIG_KVM_GS_RESTORE gs_base_pre_run(); #endif return 0; }
int kvm_cpu_exec(CPUState *cpu) { CPUArchState *env = cpu->env_ptr; struct kvm_run *run = cpu->kvm_run; int ret; dprintf("kvm_cpu_exec()\n"); do { if (cpu->exit_request) { dprintf("interrupt exit requested\n"); ret = 0; break; } kvm_arch_pre_run(cpu, run); ret = kvm_arch_vcpu_run(cpu); kvm_arch_post_run(cpu, run); if (ret == -EINTR || ret == -EAGAIN) { dprintf("io window exit\n"); ret = 0; break; } if (ret < 0) { dprintf("kvm run failed %s\n", strerror(-ret)); abort(); } kvm_run_coalesced_mmio(cpu, run); ret = 0; /* exit loop */ switch (run->exit_reason) { case KVM_EXIT_IO: dprintf("handle_io\n"); ret = kvm_handle_io(cpu, run->io.port, (uint8_t *)run + run->io.data_offset, run->io.direction, run->io.size, run->io.count); break;
apic,iopic,lapic是啥?
APIC(Advanced Programmable Interrupt Controller)取代了8259,成为目前标准的中断控制器,包括了两部分: iopic和lapic,iopic接设备,每个cpu都有lapic。iopic把中断请求发给lapic。
APIC方式下,支持更多的中断,无需使用中断共享。
PIC 、APIC(IOAPIC LAPIC):http://blog.csdn.net/hgf1011/article/details/5925661
Why should I enable IO APIC in VirtualBox?:http://serverfault.com/questions/74672/why-should-i-enable-io-apic-in-virtualbox
共享中断:http://blog.chinaunix.net/uid-20801802-id-1839061.html
结尾
android goldfish platform bus的中断控制器在guest为x86时不启用的。
现在qemu的8259都是使用了QOM模型了,这个模型太TMD的复杂了。另外hw/i386/kvm/timer/i8259.c中提供了kvm版本的8259,使用kvm提供的内核态的8259的模拟,中断的处理和IO的读写都在内核态,不需要退出kvm了,速度要更快些,有提供了内核态的apic的模拟。类似的,8254之类的也有kvm内核态的实现,所以说android emulator的性能还是有提升空间的。