XArray/SparseArray
目录
1 概述
XArray(eXtensible ARRAY)/SparseArray
是一种多维数组。 XArray
实现和
Trie
类似,都是数字查找检索。 XArray
适用于稀疏数组情况(不存在的元素比较多)。典型的使用场景是内核的页缓存,对于块设备或文件,其一是尺寸远小于64位地址空间所能表示的大小,另外实际存在的部分,通常大部分不会在页缓存中。
数字检索技术在TAOCP卷3的6.3节有详细描述:
From <The Art of Computer Programming> V3 6.3:
Instead of basing a search method on comparisons between keys, we can make use of their representation as a sequence of digits or alphabetic characters. Consider, for example, the thumb index on a large dictionary; from the first letter of a given word, we can immediately locate the pages that contain all words beginning with that letter.
If we pursue the thumb-index idea to one of its logical conclusions, we come up with a searching scheme based on repeating "subscripting".
…
XArray
内部实现为多层(Multi Layer)检索数据结构,每层有特定数量的bits。最外一层称为Leaf,存放实际数组对象。其他各层存放下一层级的索引地址。
XArray
在Linux kernel有实现,已经替换掉原来的Radix Tree。参见这里
https://elixir.bootlin.com/linux/latest/source/lib/xarray.c
假定有一个 {2,3,3}
的 XArray
,访问 123
和 234
的示意图如下:
maxIndex=(1<<(2+3+3)) - 1 = 255 1. Xarray[123], (1<<6) + (7<<3) + 3 layer1_idx = 1 layer2_idx = 7 layer3_idx = 3 LayerIdx Layer1 Layer2 Layer3 0 | | | | | | 1 |1|------| | | | | 2 | | | | | | | 3 | | | | | |----->|3|<value> 4 | | | | | | | | 5 | | | | | | | | 6 | | | | | | | | 7 | | |----->|7|-----| | | 2. Xarray[234], (3<<6) + (5<<3) + 2 layer1_idx = 3 layer2_idx = 5 layer3_idx = 2 LayerIdx Layer1 Layer2 Layer3 0 | | | | | | 1 | | | | | | 2 | | | | |----->|2|<value> 3 |3|------| | | | |3| 4 | | | | | | | | 5 | | |----->|5|-----| | | 6 | | | | | | 7 | | |7| | |
2 XArray vs. RadixTree in Linux Kernel
- 两者实质都是
Trie
,每一层4/6 bits,因此一个大的index需要查找16/11层; - XArray有更良好的API设计;
3 Adaptive Radix Tree
Paper: https://db.in.tum.de/~leis/papers/ART.pdf
XArray也可参考此论文,实现大Index的压缩。比如16Byte UUID的XArray。 UUID规范:https://tools.ietf.org/html/rfc4122
UUID = time-low "-" time-mid "-" time-high-and-version "-" clock-seq-and-reserved clock-seq-low "-" node time-low = 4hexOctet time-mid = 2hexOctet time-high-and-version = 2hexOctet clock-seq-and-reserved = hexOctet clock-seq-low = hexOctet node = 6hexOctet hexOctet = hexDigit hexDigit hexDigit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "a" / "b" / "c" / "d" / "e" / "f" / "A" / "B" / "C" / "D" / "E" / "F"
4 Implementation
暂不公开。
5 Learn from youtube
5.1 Xarray: One Data Structure To Rule Them All
5.1.1 What is the Xarray
- Automatically resizing array of pointers
- Indexed by unsigned long
- All pointers initially NULL
- Embeds a spinlock
- Loads are store-free (using rcu)
5.1.2 Fundamentals
- Some users only need load, store and (maybe) iterate:
void *xa_load(struct xarray*, unsigned long index); void *xa_store(struct xarray*, unsigned long index, void *entry, gfp_t); void *xa_erase(struct xarray*, unsigned long index); xa_for_each(struct xarray*, unsigned long index, void *entry);
5.1.3 Three auxiliary bits per non-NULL entry
void xa_set_mark(struct xarray*, unsigned long index, xa_mark_t); void xa_clear_mark(struct xarray*, unsigned long index, xa_mark_t); bool xa_get_mark(struct xarray*, unsigned long index, xa_mark_t);
5.1.4 Some users need something a little more complex
5.1.5 The Xarray can trace free entries for you
6 Reference
- LWN Introducing the eXtensible array: https://lwn.net/Articles/715948/
- LWN The XArray data structure: https://lwn.net/Articles/745073/
- Xarray: One Data Structure To Rule Them All: https://2019.linux.conf.au/schedule/presentation/213/
- Linux kernel code xarry.h and xarray.c
- LWN Radix Tree: https://lwn.net/Articles/175432/