密码学-白盒AES
白盒AES
❝本文主要参考了文献^[1], 代码参考了^[2], 这里感谢文献作者和代码作者,如果有能力的大佬,可以自行查看原文献,个人水平有限,有哪里写的不对的地方,也欢迎读者指正。
❞
AES概述
在这里,先来简单回顾一下AES具体的加密过程,在AES当中,主要有4个基本的变换
- 「轮密钥加(AddRoundKey)」: 将状态和16-byte轮密钥进行异或运算
- **字节替代(SubBytes)**A: 采用一个置换表S-box, 将状态当中的每个字节映射到置换表对应的元素。
- 「行移位(ShiftRows)」: 对字节进行重排列, 具体的矩阵就不贴出来了。
- 「列混淆(MixColumns)」: 一次更新状态当中的4个字节,实际上是左乘一个在GF(256)上面的一个可逆矩阵。
对于密钥扩展算法在本文当中不是重点,因此不具体描述了。
整个AES的加密过程如下:
这里只是对AES进行一个简单的回顾,如果不熟悉的读者可以自行查阅相关资料,或者看一下我之前写的对于AES的文章。
基于查表法实现AES
为了实现白盒AES的目标,需要对原始AES的流程进行"亿点点"变换,下面来看一下另外的一种AES实现的方法。
调整轮函数结构
观察AES加密流程,可以发现,如果把 「轮密钥加(AddRoundKey)」 放到循环内部,实际上整个算法结构是不变的,如下所示:
交换字节替代和行移位
来形象的看一下,字节替代和行移位的过程:
交换字节替代和行移位
通过上图可以发现,最终结果交换上面两部分是没关系的,修改之后的流程如下:
不信的读者,可以自行编写一下代码试试,这一步比较好理解,我就不单独写代码验证一下了。
交换轮密钥加和行移位
然后考虑到行移位是线性的,因此可以对轮密钥做同样的置换运算,然后交换行移位和轮密钥加进行一个交换,这里的原理和上面的类似,主要是因为轮密钥加并不改变位置,行移位不改变值,稍微注意的一点是,这里轮密钥加的密钥要同样做一次行移位,和原始的对应起来,具体流程如下:
做到这一步处理,就可以来看如何构造查表了,下面介绍一下需要用到的表。
T-boxes
首先观察一下 「轮密钥加」 和 「字节替代」,可以构造一个函数,对应输入的任意一个状态,得到特定的输出。
注意到,对于x的取值实际上是0..255, 因此可以枚举这个函数的值域,由此得到 「T-boxes」。
fn calculate_t_boxes(round_key: &[u32; 44]) -> [[[u8; 256]; 16]; 10] { let mut t_boxes = [[[0u8; 256]; 16]; 10]; for r in 0..10 { for x in 0..256 { let mut state = [x as u8; 16]; add_round_key_after_shift(&mut state, &round_key[r * 4..(r * 4 + 4)].try_into().expect("")); sub_bytes(&mut state); if r == 9 { add_round_key(&mut state, round_key[40..44].try_into().expect("")) } for i in 0..16 { t_boxes[r][i][x] = state[i]; } } } return t_boxes; }
对于 「T-boxes」 总共有160个,每个表是256个字节。
Ty-tables
下面来看一下列混淆的过程,注意到GF(256)上面的加法运算实际上是异或运算,不理解的可以参考我之前写过有关有限域介绍的文章,这里不展开了。
从上面,同理可以找到一个函数:
这里,其中是对应矩阵前面的系数,这块和文章里面不太一致,这块我看原始的文章也有点迷,通过代码结合文章,我感觉是这样的,如果这块我理解的有错误的地方,还请读者和我讨论。
fn calculate_ty() -> [[[u8; 4]; 256]; 4] { let mut ty = [[[0u8; 4]; 256]; 4]; for x in 0..=255 { ty[0][x][0] = gmul(x as u8, 2); ty[0][x][1] = gmul(x as u8, 3); ty[0][x][2] = x as u8; ty[0][x][3] = x as u8; ty[1][x][0] = x as u8; ty[1][x][1] = gmul(x as u8, 2); ty[1][x][2] = gmul(x as u8, 3); ty[1][x][3] = x as u8; ty[2][x][0] = x as u8; ty[2][x][1] = x as u8; ty[2][x][2] = gmul(x as u8, 2); ty[2][x][3] = gmul(x as u8, 3); ty[3][x][0] = gmul(x as u8, 3); ty[3][x][1] = x as u8; ty[3][x][2] = x as u8; ty[3][x][3] = gmul(x as u8, 2); } ty }
这个可以结合查表的最主要的原因还是因为x的范围比较小,只有0..255, 因此可以枚举,原始代码计算乘法用的查表,我直接复制了一个我之前写好的有限域乘法的函数,如果看原始代码注意一下这一点。
XOR tables
这个表比较简单,实际上是对于每轮当中的两个半字节进行一个查表的异或运算,同样可以定义一个函数:
fn calculate_xor_table() -> [[[[u8; 16]; 16]; 96]; 9] { let mut xor_table = [[[[0u8; 16]; 16]; 96]; 9]; for r in 0..9 { for n in 0..96 { for i in 0..16 { for j in 0..16 { xor_table[r][n][i][j] = (i ^ j) as u8; } } } } xor_table }
不过这个表的作用我没发现,我感觉如果去掉这个表问题好像也不大,也有可能我哪里理解错了,这里直接用异或操作不就可以了,有点迷,后文实现还是沿用有xor table的方案。
表的合并
可以发现,「T-Boxes」 和 「Tyi-tables」是可以合并到一起的,如下所示:
代码如下所示:
fn calculate_ty_boxes(round_key: &[u32; 44]) -> ([[[u32; 256]; 16]; 9], [[u8; 256]; 16]) { let t_boxes = calculate_t_boxes(&round_key); let ty: [[[u8; 4]; 256]; 4] = calculate_ty(); let mut ty_boxes = [[[0u32; 256]; 16]; 9]; let mut last_t_boxes = [[0u8; 256]; 16]; for r in 0..9 { for x in 0..256 { for j in 0..4 { for i in 0..4 { let v0 = ty[0][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v1 = ty[1][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v2 = ty[2][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v3 = ty[3][t_boxes[r][j * 4 + i][x] as usize][i] as u32; ty_boxes[r][j * 4 + i][x] = (v0.wrapping_shl(24) | (v1.wrapping_shl(16)) | v2.wrapping_shl(8) | v3) as u32; } } } } for x in 0..256 { for i in 0..16 { last_t_boxes[i][x] = t_boxes[9][i][x]; } } (ty_boxes, last_t_boxes) }
这里实际上是以32位存储的y值。
总结
通过上面的描述,我们可以发现,对于整个AES的计算过程可以通过三个表来查表实现
- 「TboxesTyiTables」
- 「XORTables」
- 「TBoxes」
详细过程如下:
AES-查表法
查表法实现AES完整代码如下:
use std::convert::TryInto; // region 标准aes实现 static SBOX: [[u8; 16]; 16] = [ [0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76], [0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0], [0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15], [0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75], [0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84], [0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf], [0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8], [0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2], [0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73], [0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb], [0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79], [0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08], [0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a], [0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e], [0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf], [0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16], ]; fn gmul(ap: u8, bp: u8) -> u8 { let mut p = 0u8; let mut a = ap; let mut b = bp; while a != 0 && b != 0 { if b & 1 != 0 { p ^= a; } if a & 0x80 != 0 { // XOR with the primitive polynomial x^8 + x^4 + x^3 + x + 1 (0b1_0001_1011) – you can change it but it must be irreducible a = (((a << 1) as u16) ^ 0x11b) as u8; } else { a = a << 1; } b >>= 1; } return p & 0xFF; } fn add_round_key(state: &mut [u8; 16], key: &[u32; 4]) { for i in 0..4 { for j in 0..4 { state[i * 4 + j] ^= (key[i] >> ((3 - j) * 8)) as u8; } } } fn add_round_key_after_shift(state: &mut [u8; 16], key: &[u32; 4]) { for i in 0..4 { for j in 0..4 { state[i * 4 + j] ^= (key[(j + i) % 4] >> ((3 - j) * 8)) as u8; } } } fn sub_bytes(state: &mut [u8; 16]) { for i in 0..4 { for j in 0..4 { state[i * 4 + j] = SBOX[(state[i * 4 + j] >> 4) as usize][(state[i * 4 + j] & 0x0F) as usize]; } } } fn shift_rows(state: &mut [u8; 16]) { let shifts = [ 0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11, ]; let in_state = [ state[0], state[1], state[2], state[3], state[4], state[5], state[6], state[7], state[8], state[9], state[10], state[11], state[12], state[13], state[14], state[15], ]; for i in 0..16 { state[i] = in_state[shifts[i]]; } } fn mix_columns(state: &mut [u8; 16]) { for i in 0..4 { let (a, b, c, d) = (state[4 * i + 0], state[4 * i + 1], state[4 * i + 2], state[4 * i + 3]); state[0 + 4 * i] = gmul(a, 2) ^ gmul(d, 1) ^ gmul(c, 1) ^ gmul(b, 3); state[1 + 4 * i] = gmul(b, 2) ^ gmul(a, 1) ^ gmul(d, 1) ^ gmul(c, 3); state[2 + 4 * i] = gmul(c, 2) ^ gmul(b, 1) ^ gmul(a, 1) ^ gmul(d, 3); state[3 + 4 * i] = gmul(d, 2) ^ gmul(c, 1) ^ gmul(b, 1) ^ gmul(a, 3); } } fn sub_word(word: u32) -> u32 { let mut result = 0u32; for i in 0..4 { let upper_nibble = ((word >> (4 + 8 * i)) & 0xF) as usize; let lower_nibble = ((word >> (8 * i)) & 0xF) as usize; result += (SBOX[upper_nibble][lower_nibble] as u32) << (8 * i); } return result; } fn rot_word(x: u32) -> u32 { (x << 8) | (x >> 24) } fn expend_keys(key: &[u8; 16], nk: u32, nr: u32) -> [u32; 44] { let nb = 4; let r_con: [u32; 15] = [ 0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000, 0x20000000, 0x40000000, 0x80000000, 0x1b000000, 0x36000000, 0x6c000000, 0xd8000000, 0xab000000, 0x4d000000, 0x9a000000, ]; let mut w = [0u32; 44]; for i in 0..4 { w[i] = ((((key[4 * i + 0]) as u32) << 24) | ((key[4 * i + 1] as u32) << 16) | ((key[4 * i + 2] as u32) << 8) | ((key[4 * i + 3] as u32) << 0)) as u32; } for i in nk..(nb * (nr + 1)) { let mut temp = w[(i - 1) as usize]; if i % nk == 0 { temp = sub_word(rot_word(temp)) ^ r_con[((i - 1) / nk) as usize]; } else if nk > 6 && (i % nk) == 4 { temp = sub_word(temp); } w[i as usize] = w[(i - nk) as usize] ^ temp; } w } fn encrypt_block(bytes: &[u8; 16], round_key: &[u32; 44]) -> [u8; 16] { let mut result = [0u8; 16]; let mut state = [ bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9], bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15], ]; add_round_key(&mut state, round_key[0..4].try_into().expect("")); for i in 1..10 { sub_bytes(&mut state); shift_rows(&mut state); mix_columns(&mut state); add_round_key(&mut state, round_key[i * 4..(i + 1) * 4].try_into().expect("")); } sub_bytes(&mut state); shift_rows(&mut state); add_round_key(&mut state, round_key[40..44].try_into().expect("")); for i in 0..16 { result[i] = state[i]; } result } // endregion fn calculate_t_boxes(round_key: &[u32; 44]) -> [[[u8; 256]; 16]; 10] { let mut t_boxes = [[[0u8; 256]; 16]; 10]; for r in 0..10 { for x in 0..256 { let mut state = [x as u8; 16]; add_round_key_after_shift(&mut state, &round_key[r * 4..(r * 4 + 4)].try_into().expect("")); sub_bytes(&mut state); if r == 9 { add_round_key(&mut state, round_key[40..44].try_into().expect("")) } for i in 0..16 { t_boxes[r][i][x] = state[i]; } } } return t_boxes; } fn calculate_ty() -> [[[u8; 4]; 256]; 4] { let mut ty = [[[0u8; 4]; 256]; 4]; for x in 0..=255 { ty[0][x][0] = gmul(x as u8, 2); ty[0][x][1] = gmul(x as u8, 3); ty[0][x][2] = x as u8; ty[0][x][3] = x as u8; ty[1][x][0] = x as u8; ty[1][x][1] = gmul(x as u8, 2); ty[1][x][2] = gmul(x as u8, 3); ty[1][x][3] = x as u8; ty[2][x][0] = x as u8; ty[2][x][1] = x as u8; ty[2][x][2] = gmul(x as u8, 2); ty[2][x][3] = gmul(x as u8, 3); ty[3][x][0] = gmul(x as u8, 3); ty[3][x][1] = x as u8; ty[3][x][2] = x as u8; ty[3][x][3] = gmul(x as u8, 2); } ty } fn calculate_ty_boxes(round_key: &[u32; 44]) -> ([[[u32; 256]; 16]; 10], [[u8; 256]; 16]) { let t_boxes = calculate_t_boxes(&round_key); let ty: [[[u8; 4]; 256]; 4] = calculate_ty(); let mut ty_boxes = [[[0u32; 256]; 16]; 10]; let mut last_t_boxes = [[0u8; 256]; 16]; for r in 0..9 { for x in 0..256 { for j in 0..4 { for i in 0..4 { let v0 = ty[0][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v1 = ty[1][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v2 = ty[2][t_boxes[r][j * 4 + i][x] as usize][i] as u32; let v3 = ty[3][t_boxes[r][j * 4 + i][x] as usize][i] as u32; ty_boxes[r][j * 4 + i][x] = (v0.wrapping_shl(24) | (v1.wrapping_shl(16)) | v2.wrapping_shl(8) | v3) as u32; } } } } for x in 0..256 { for i in 0..16 { last_t_boxes[i][x] = t_boxes[9][i][x]; } } // println!("ty_boxes{:x?}", ty_boxes); (ty_boxes, last_t_boxes) } fn calculate_xor_table() -> [[[[u8; 16]; 16]; 96]; 9] { let mut xor_table = [[[[0u8; 16]; 16]; 96]; 9]; for r in 0..9 { for n in 0..96 { for i in 0..16 { for j in 0..16 { xor_table[r][n][i][j] = (i ^ j) as u8; } } } } xor_table } fn encrypt_block_by_table(bytes: &[u8; 16], round_key: &[u32; 44]) -> [u8; 16] { let mut result = [0u8; 16]; let (ty_boxes, last_t_boxes) = calculate_ty_boxes(round_key); let xor_table = calculate_xor_table(); let mut input = bytes.clone(); for r in 0..9 { shift_rows(&mut input); for j in 0..4 { let aa = ty_boxes[r][j * 4 + 0][input[j * 4 + 0] as usize] as usize; let bb = ty_boxes[r][j * 4 + 1][input[j * 4 + 1] as usize] as usize; let cc = ty_boxes[r][j * 4 + 2][input[j * 4 + 2] as usize] as usize; let dd = ty_boxes[r][j * 4 + 3][input[j * 4 + 3] as usize] as usize; let n0 = xor_table[r][j * 24 + 0][(aa >> 28) & 0xf][(bb >> 28) & 0xf] as usize; let n1 = xor_table[r][j * 24 + 1][(cc >> 28) & 0xf][(dd >> 28) & 0xf] as usize; let n2 = xor_table[r][j * 24 + 2][(aa >> 24) & 0xf][(bb >> 24) & 0xf] as usize; let n3 = xor_table[r][j * 24 + 3][(cc >> 24) & 0xf][(dd >> 24) & 0xf] as usize; input[j * 4 + 0] = (xor_table[r][j * 24 + 4][n0][n1] << 4) | xor_table[r][j * 24 + 5][n2][n3]; let n0 = xor_table[r][j * 24 + 6][(aa >> 20) & 0xf][(bb >> 20) & 0xf] as usize; let n1 = xor_table[r][j * 24 + 7][(cc >> 20) & 0xf][(dd >> 20) & 0xf] as usize; let n2 = xor_table[r][j * 24 + 8][(aa >> 16) & 0xf][(bb >> 16) & 0xf] as usize; let n3 = xor_table[r][j * 24 + 9][(cc >> 16) & 0xf][(dd >> 16) & 0xf] as usize; input[j * 4 + 1] = (xor_table[r][j * 24 + 10][n0][n1] << 4) | xor_table[r][j * 24 + 11][n2][n3]; let n0 = xor_table[r][j * 24 + 12][(aa >> 12) & 0xf][(bb >> 12) & 0xf] as usize; let n1 = xor_table[r][j * 24 + 13][(cc >> 12) & 0xf][(dd >> 12) & 0xf] as usize; let n2 = xor_table[r][j * 24 + 14][(aa >> 8) & 0xf][(bb >> 8) & 0xf] as usize; let n3 = xor_table[r][j * 24 + 15][(cc >> 8) & 0xf][(dd >> 8) & 0xf] as usize; input[j * 4 + 2] = (xor_table[r][j * 24 + 16][n0][n1] << 4) | xor_table[r][j * 24 + 17][n2][n3]; let n0 = xor_table[r][j * 24 + 18][(aa >> 4) & 0xf][(bb >> 4) & 0xf] as usize; let n1 = xor_table[r][j * 24 + 19][(cc >> 4) & 0xf][(dd >> 4) & 0xf] as usize; let n2 = xor_table[r][j * 24 + 20][(aa >> 0) & 0xf][(bb >> 0) & 0xf] as usize; let n3 = xor_table[r][j * 24 + 21][(cc >> 0) & 0xf][(dd >> 0) & 0xf] as usize; input[j * 4 + 3] = (xor_table[r][j * 24 + 22][n0][n1] << 4) | xor_table[r][j * 24 + 23][n2][n3]; } } shift_rows(&mut input); for i in 0..16 { result[i] = last_t_boxes[i][input[i] as usize]; } result } #[cfg(test)] mod test { use crate::wbaes::{expend_keys, encrypt_block, encrypt_block_by_table}; #[test] fn test_expand_key() { let key = [43, 126, 21, 22, 40, 174, 210, 166, 171, 247, 21, 136, 9, 207, 79, 60]; println!("{:x?}", key); let input = [50, 67, 246, 168, 136, 90, 48, 141, 49, 49, 152, 162, 224, 55, 7, 52]; // let input = [65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65]; let w = expend_keys(&key, 4, 10); println!("{:?}", w); let block = encrypt_block(&input, &w); println!("{:x?}", block); let block_table = encrypt_block_by_table(&input, &w); println!("{:x?}", block_table); } }
这段代码只是实现了加密,因为解密的话,这里需要修改表的生成,本文就不在这里介绍了。