【源码系列】Java中的数据结构——数组与ArrayList2-阿里云开发者社区

2.3.4 get方法

我们知道ArrayList实际上就是数组的封装，所以对于具有随机访问的数组而言，根据下标访问元素实在太简单不过了。

public E get(int index) {
  //下标安全检查
       rangeCheck(index);
       return elementData(index);
}

2.3.5 set方法

修改index下标的元素为element

public E set(int index, E element) {
  //下标检查
        rangeCheck(index);
  //旧的值
        E oldValue = elementData(index);
        //更新值
        elementData[index] = element;
        //返回旧的值
        return oldValue;
    }

2.3.6 size方法

public int size() {
        return size;
    }

其实就是把size返回，但这里要注意的是size的意义，它不是指当前ArrayList的容量有多少，而是指当前ArrayList中有多少元素。

2.3.7 indexOf方法

public int indexOf(Object o) {
        if (o == null) {
            for (int i = 0; i < size; i++)
                if (elementData[i]==null)
                    return i;
        } else {
            for (int i = 0; i < size; i++)
                if (o.equals(elementData[i]))
                    return i;
        }
        return -1;
    }

该方法就是遍历查找是否有对象o，如果找到则返回下标。

2.3.8 toArray方法

public Object[] toArray() {
        return Arrays.copyOf(elementData, size);
    }

需要注意的是该方法是将其elementData对象数组从内存中复制了一份，而且长度就是元素的数量。所以不必担心修改toArray方法的得到的数组会影响原ArrayList对象。

2.3.9 removeAll和retainAll方法

①removeAll(Collection c)

删除ArrayList中所有和集合中相同的元素

public boolean removeAll(Collection<?> c) {
  //检查该集合是否为空
        Objects.requireNonNull(c);
        return batchRemove(c, false);
    }

该方法的核心在于boolean batchRemove(Collection c, boolean complement)方法，我们点进去看一下，

private boolean batchRemove(Collection<?> c, boolean complement) {
        final Object[] elementData = this.elementData;
        //r为快指针，w为慢指针
        int r = 0, w = 0;
        boolean modified = false;
        try {
            for (; r < size; r++)
                if (c.contains(elementData[r]) == complement)
                  //只有符合条件时，w才自增
                    elementData[w++] = elementData[r];
        } finally {
            // 这里正常情况都是r=size，只有在程序遇到异常时，才可能会遇到r！=size的情况。
            if (r != size) {
                System.arraycopy(elementData, r,
                                 elementData, w,
                                 size - r);
                w += size - r;
            }
            //当w！=size时，即有元素被抛弃了，这时需要把后面无用的元素值置为null
            if (w != size) {
                // 将后面的元素值置为null
                for (int i = w; i < size; i++)
                    elementData[i] = null;
                //modcount加上修改过的元素
                modCount += size - w;
                //更新size
                size = w;
                //此时已经抛弃了一些元素，所以置为true
                modified = true;
            }
        }
        return modified;
    }

可以看到这里用了快慢指针的技巧，当遇到要保留的元素时w再自增，最终保留元素的下标便是0到w-1。

在遍历的过程中调用contains方法看看这个元素是否在集合c中。

这里巧妙的地方在于它把contains的返回值complement参数比较。

这样，Boolean类型的complement参数的作用就类似于一个开关，当为true时，这个函数就是保留在集合c中出现过的元素，否则保留未出现过的元素。

②retainAll(Collection c)

而retainAll也类似，只不过它把开关置为false，这样就会保留在集合c中出现过的元素。

public boolean retainAll(Collection<?> c) {
        Objects.requireNonNull(c);
        return batchRemove(c, true);
    }

2.4 迭代器

2.4.1 普通迭代器Itr

private class Itr implements Iterator<E> {
  //下一个元素下标
        int cursor;       // index of next element to return
        //上一个元素下标
        int lastRet = -1; // index of last element returned; -1 if no such
        //期望修改次数（后面会讲，现在不用管）
        int expectedModCount = modCount;
        Itr() {}
  //判断是否有下一个
        public boolean hasNext() {
            return cursor != size;
        }
  //返回当前停留的元素
        @SuppressWarnings("unchecked")
        public E next() {
          //检查是否有修改，这个之后会讲，现在先不管它
            checkForComodification();
            int i = cursor;
            //如果此时下一次超出数组范围，则抛出没有NoSuchElementException异常
            if (i >= size)
                throw new NoSuchElementException();
            Object[] elementData = ArrayList.this.elementData;
            //如果此时i还能大于等于数组的长度，唯一的可能就是在遍历过程中有程序对其做了修改，所以此时应抛出并发异常ConcurrentModificationException
            if (i >= elementData.length)
                throw new ConcurrentModificationException();
            //下一个元素的下标为i+1
            cursor = i + 1;
            //返回下标为i的元素
            return (E) elementData[lastRet = i];
        }
        public void remove() {
          //如果当前的元素下标小于0，抛出IllegalStateException异常
            if (lastRet < 0)
                throw new IllegalStateException();
            //检查是否有修改
            checkForComodification();
            try {
              //删除上一个元素
                ArrayList.this.remove(lastRet);
                //指针回退
                cursor = lastRet;
                //上一个指针回归-1，这个就说明了不能连续调用remove方法
                lastRet = -1;
                //期望的修改值变为当前的修改值（这些之后会讲）
                expectedModCount = modCount;
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }
}

这里有个值得注意的地方——它在定义的迭代器的时候加了个private，那我们如何去创建迭代器呢？

其实它有个Iterator（）方法，

public Iterator<E> iterator() {
        return new Itr();
    }

它就返回了一个迭代器的对象。

这样的好处就在于面对不同的集合类只需调用通用的接口方法iterator（）便可以返回相应的迭代器。

看完上面的源码我们明白了以下几点：

1. 这个迭代器对象只能使用一次，因为它没有初始化重置的方法

2. 这个迭代器只会往前，因为它只有next方法

3. 不能连续调用remove方法，因为调用了一次后，指向上一个元素下标的lastRet变量便被赋值为-1，连续调用会抛出异常

看完上面这个迭代器的源码，你会发现——初看源码会有一种不知所云的感觉，但是去掉这些让我看起来不知所云的操作和边界条件的检查后，是不是感觉简单了起来？

没错，如果不管这些边界条件，不管并发可能带来的错误，那么我们也能写出ArrayList的迭代器。

这里，为了更好理解代码，我这里放上该迭代器使用简单例子

Iterator<Integer> iterator=list.iterator();
while (iterator.hasNext()){
    Integer i=iterator.next();
    iterator.remove();
}

看了源码后是不是更有感觉了呢？

2.4.2 迭代器的增强版ListItr

private class ListItr extends Itr implements ListIterator<E> {
        ListItr(int index) {
            super();
            cursor = index;
        }
        public boolean hasPrevious() {
            return cursor != 0;
        }
        public int nextIndex() {
            return cursor;
        }
        public int previousIndex() {
            return cursor - 1;
        }
        @SuppressWarnings("unchecked")
        public E previous() {
          //检查是否有修改
            checkForComodification();
            //cursor前移
            int i = cursor - 1;
            //检查i是否小于0
            if (i < 0)
                throw new NoSuchElementException();
            Object[] elementData = ArrayList.this.elementData;
            //如果此时i还能大于等于数组的长度，唯一的可能就是在遍历过程中有程序对其做了修改，所以此时应抛出并发异常ConcurrentModificationException
            if (i >= elementData.length)
                throw new ConcurrentModificationException();
            cursor = i;
            return (E) elementData[lastRet = i];
        }
        public void set(E e) {
            if (lastRet < 0)
                throw new IllegalStateException();
            checkForComodification();
    //修改当前下标的元素值
            try {
                ArrayList.this.set(lastRet, e);
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }
        public void add(E e) {
            checkForComodification();
            try {
                int i = cursor;
                ArrayList.this.add(i, e);
                cursor = i + 1;
                lastRet = -1;
                expectedModCount = modCount;
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }
    }

可以看到ListItr是Itr的子类，在有Itr的功能的基础上，拓展了前移指针的previous（）方法、增加元素的add（E e）方法和修改当前元素的方法set(E e)。源码操作也不难理解，这里就不多介绍了。

2.4.3 干扰检测

①为什么要有干扰检测

在看迭代器的源码，你是不是很疑惑为啥会有执行checkForComodification（）检查？

你是不是很疑惑为什么很多地方会有modCount++，expectedModCount = modCount这种操作？

你是不是很疑惑modCount，expectedModCount 是来用来干什么的呢？

带着这些疑问，我们来探究一下这样做的原因和原理。

首先，不知道你在写程序的时候遇到ConcurrentModificationException（）这个异常，在我们用迭代器遍历过程中，有没有删除增加过集合。比如下面这样：

for (Integer i:list){
      list.add(1);
}

注意：ArrayList的foreach增强遍历本质就是调用了迭代器。

我们在遍历途中不断向集合中添加整数1，这看似好像没有什么问题，可仔细一想，如果不断添加，那么这个迭代器调用的next真的是原来所期望的next吗？看似好像不会影响结果，但实际上这往往会发生一些逻辑上的错误，如果不对这些情况做出一些提示，程序员会非常苦恼，因为他们很难发现这里面的问题。

同时在并发编程中，如果你对共享变量误用了非线程安全的ArrayList，那么有个线程正在遍历ArrayList集合，可以另一个线程对其做了修改，这会不会发生一些错误呢？尽管在这种情况下ArrayList并未做出有效解决方案，但它必须有责任告知你这个情况，所以它会抛出这个ConcurrentModificationException异常。

②干扰检测原理

那么它是如何实现的呢？

核心就modcount和expectedModCount 变量。

还记得我们调用add，remove方法出现modcount++操作吗？

在这里modcount就是累计修改的次数，expectedModCount 就是预期修改的次数。

这里要注意的是modcount是全局的一个属性

protected transient int modCount = 0;

而expectedModCount 是在迭代器内部的一个属性。

言外之意就是该错误检测一定是在迭代器内部实现的。

阅读源码我们发现，在ArrayList每次进程执行add，remove方法时，都会触发一次modcount++。

同时在创建迭代器时，expectedModCount 的初值就被赋值为当时的modcount，

而每次迭代器在进行next和remove方法时都会做一次检测

点开方法查看

final void checkForComodification() {
       if (modCount != expectedModCount)
           throw new ConcurrentModificationException();
}

这个方法实质就是在比较modCount和expectedModCount是否相等，如果不相等就抛出ConcurrentModificationException异常。

检查完后，我们对集合进行修改时，如执行remove方法，那么就会modcount++,经过几步操作后在把modcount的值赋值给expectedModCount ，这是就保证了expectedModCount 与modcount是相等的，但是预期之外的修改可不会执行这句话，所以就会造成modCount和expectedModCount值不一致，从而检查出了错误。

到这里我们终于明白了它的核心思路：

在迭代器里的修改（主要是指add和remove方法）是被允许的，因为这是迭代器内可见的。但是在预期之外的修改是不被允许的，一旦迭代器发现了预期之外的修改（执行checkForComodification（）方法检查modCount和expectedModCount是否相等），那么迭代器就会抛出ConcurrentModificationException异常告诉工程师——啊，我迭代过程中出现问题了，可能会发生一些逻辑错误。

到现在你也明白了这样做的原因和原理，那么我问你个问题——你觉得这个过程完美吗？

看着很完美是吗？

其实不是的，实际上你仔细看看这段

ArrayList.this.remove(lastRet);
                cursor = lastRet;
                lastRet = -1;
                expectedModCount = modCount;

modCount修改到expectedModCount被更新这里中间差了这么多步，那我们假设在这个过程中其他线程修改了集合，这时modcount又被更新了，但是expectedModCount 还没更新啊。当expectedModCount 更新时，它以为它更新的值是预期中的，但是实际上有预期外的修改混了进来，而它却浑然不知。

所以说，ArrayList是个非线程安全的类，即使在迭代过程中，它有着能感知这种意外的机制，但是这种机制并不是那么灵敏，在一些情况下，它检测不出并发错误。（当然啦，这个机制最主要的是检查工程师在遍历过程中误修改操作，对于并发错误的检测也只是顺带的罢了）

当然啦，源码中也有这么一段注释：

/*
         * If ArrayLists were immutable, or structurally immutable (no
         * adds, removes, etc), we could implement their spliterators
         * with Arrays.spliterator. Instead we detect as much
         * interference during traversal as practical without
         * sacrificing much performance. We rely primarily on
         * modCounts. These are not guaranteed to detect concurrency
         * violations, and are sometimes overly conservative about
         * within-thread interference, but detect enough problems to
         * be worthwhile in practice. To carry this out, we (1) lazily
         * initialize fence and expectedModCount until the latest
         * point that we need to commit to the state we are checking
         * against; thus improving precision.  (This doesn't apply to
         * SubLists, that create spliterators with current non-lazy
         * values).  (2) We perform only a single
         * ConcurrentModificationException check at the end of forEach
         * (the most performance-sensitive method). When using forEach
         * (as opposed to iterators), we can normally only detect
         * interference after actions, not before. Further
         * CME-triggering checks apply to all other possible
         * violations of assumptions for example null or too-small
         * elementData array given its size(), that could only have
         * occurred due to interference.  This allows the inner loop
         * of forEach to run without any further checks, and
         * simplifies lambda-resolution. While this does entail a
         * number of checks, note that in the common case of
         * list.stream().forEach(a), no checks or other computation
         * occur anywhere other than inside forEach itself.  The other
         * less-often-used methods cannot take advantage of most of
         * these streamlinings.
         */

翻译过来就是：

如果 ArrayLists 是不可变的，或者结构上不可变的（没有添加、删除等），我们可以使用 Arrays.spliterator 实现它们的拆分器。相反，我们在不牺牲太多性能的情况下，在遍历过程中检测到尽可能多的干扰。我们主要依赖 modCounts。这些不能保证检测到并发冲突，并且有时对线程内干扰过于保守，但是可以检测到足够的问题以值得实践。为了实现这一点，我们（1）延迟初始化fence和expectedModCount，直到我们需要提交到我们正在检查的状态的最新点；从而提高精度。（这不适用于创建具有当前非惰性值的拆分器的子列表）。 (2) 我们在 forEach 的末尾只执行一个ConcurrentModificationException 检查（对性能最敏感的方法）。当使用 forEach（而不是迭代器）时，我们通常只能在动作之后检测干扰，而不是之前。进一步的 CME触发检查适用于所有其他可能违反假设的情况，例如 null 或给定 size() 的太小的 elementData数组，这可能仅由于干扰而发生。这允许 forEach 的内部循环在没有任何进一步检查的情况下运行，并简化了 lambda解析。虽然这确实需要一些检查，但请注意，在 list.stream().forEach(a) 的常见情况下，除了 forEach本身之外，不会在任何地方进行检查或其他计算。其他不太常用的方法不能利用这些流线化中的大部分。

2.4.4 ArrayListSpliterator拆分迭代器

这个拆分迭代器时jdk8引入的，是为了满足流式编程的需要，用于并行流的拆分，这块我之后再补。（未完待续…）

static final class ArrayListSpliterator<E> implements Spliterator<E> {
        /*
         * If ArrayLists were immutable, or structurally immutable (no
         * adds, removes, etc), we could implement their spliterators
         * with Arrays.spliterator. Instead we detect as much
         * interference during traversal as practical without
         * sacrificing much performance. We rely primarily on
         * modCounts. These are not guaranteed to detect concurrency
         * violations, and are sometimes overly conservative about
         * within-thread interference, but detect enough problems to
         * be worthwhile in practice. To carry this out, we (1) lazily
         * initialize fence and expectedModCount until the latest
         * point that we need to commit to the state we are checking
         * against; thus improving precision.  (This doesn't apply to
         * SubLists, that create spliterators with current non-lazy
         * values).  (2) We perform only a single
         * ConcurrentModificationException check at the end of forEach
         * (the most performance-sensitive method). When using forEach
         * (as opposed to iterators), we can normally only detect
         * interference after actions, not before. Further
         * CME-triggering checks apply to all other possible
         * violations of assumptions for example null or too-small
         * elementData array given its size(), that could only have
         * occurred due to interference.  This allows the inner loop
         * of forEach to run without any further checks, and
         * simplifies lambda-resolution. While this does entail a
         * number of checks, note that in the common case of
         * list.stream().forEach(a), no checks or other computation
         * occur anywhere other than inside forEach itself.  The other
         * less-often-used methods cannot take advantage of most of
         * these streamlinings.
         */
        private final ArrayList<E> list;
        private int index; // current index, modified on advance/split
        private int fence; // -1 until used; then one past last index
        private int expectedModCount; // initialized when fence set
        /** Create new spliterator covering the given  range */
        ArrayListSpliterator(ArrayList<E> list, int origin, int fence,
                             int expectedModCount) {
            this.list = list; // OK if null unless traversed
            this.index = origin;
            this.fence = fence;
            this.expectedModCount = expectedModCount;
        }
        private int getFence() { // initialize fence to size on first use
            int hi; // (a specialized variant appears in method forEach)
            ArrayList<E> lst;
            if ((hi = fence) < 0) {
                if ((lst = list) == null)
                    hi = fence = 0;
                else {
                    expectedModCount = lst.modCount;
                    hi = fence = lst.size;
                }
            }
            return hi;
        }
        public ArrayListSpliterator<E> trySplit() {
            int hi = getFence(), lo = index, mid = (lo + hi) >>> 1;
            return (lo >= mid) ? null : // divide range in half unless too small
                new ArrayListSpliterator<E>(list, lo, index = mid,
                                            expectedModCount);
        }
        public boolean tryAdvance(Consumer<? super E> action) {
            if (action == null)
                throw new NullPointerException();
            int hi = getFence(), i = index;
            if (i < hi) {
                index = i + 1;
                @SuppressWarnings("unchecked") E e = (E)list.elementData[i];
                action.accept(e);
                if (list.modCount != expectedModCount)
                    throw new ConcurrentModificationException();
                return true;
            }
            return false;
        }
        public void forEachRemaining(Consumer<? super E> action) {
            int i, hi, mc; // hoist accesses and checks from loop
            ArrayList<E> lst; Object[] a;
            if (action == null)
                throw new NullPointerException();
            if ((lst = list) != null && (a = lst.elementData) != null) {
                if ((hi = fence) < 0) {
                    mc = lst.modCount;
                    hi = lst.size;
                }
                else
                    mc = expectedModCount;
                if ((i = index) >= 0 && (index = hi) <= a.length) {
                    for (; i < hi; ++i) {
                        @SuppressWarnings("unchecked") E e = (E) a[i];
                        action.accept(e);
                    }
                    if (lst.modCount == mc)
                        return;
                }
            }
            throw new ConcurrentModificationException();
        }
        public long estimateSize() {
            return (long) (getFence() - index);
        }
        public int characteristics() {
            return Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED;
        }
    }

2.4.5 迭代器总结

除了拆分迭代器，ArrayList中的迭代器其实很简单。

如果没有那些边界检查，没有并发错误检测，就算作为初学者的我们也能出色的完成任务。但是作为jdk类库中最常用的集合类，它必须是可靠的，它必须有着极高的性能，同时拥有较高的安全性。

迭代器作为遍历集合的手段，它可以在遍历的过程中不断遍历同时对元素做出修改。在这个过程中它会尽量避免一些错误的发生。

Itr是普遍的迭代器，它只能单向遍历，同时遍历完后就没用了，也无法连续调用remove方法。

ListItr作为Itr的增强版（也是子类），在其基础上拓展了双向遍历的功能，同时支持增删改的操作。

四、总结

总结一下，

ArrayList是对数据结构中数组的封装，其核心是就是对象数组elementData。它最大的特点就是让数组看起来是可以动态扩容的，并且提供各种方便的操作，极大提升了我们日常开发的效率。

对于ArrayList，你需要记住以下几点：

ArrayList的核心是对象数组elementData，它是存放数据的地方。

size并不代表这数组的容量，而是指当前list中元素的数量。

由于elementData数组并未对外开放，你无法直接获取elementData数组，除非你用反射，同时容量也无法直接获取。

ArrayList是看起来支持动态扩容，实际依旧摆脱不了数组的特性——创建容量即确定。扩容原理就是调用System.arrayCopy方法进行数组复制，把原来数组的内容复制到另外一个容量更大的数组里。

ArrayList创建时最好给它一个初始容量来避免后续的扩容。

ArrayList适用于查询多，增删操作少的情况。

在add方法执行后，如果容量不足它会自动扩容，一般情况下扩容后的容量时原来容量的3/2倍

迭代器为了保证迭代过程中不出现预期之外的修改，会进行检查。如果出现了预期之外的修改，会抛出ConcurrentModificationException异常。

整篇源码读下来，其实好处还是很多的，

在对外方法的暴露上，在对内代码的封装和复用上，在方法间的调用上，你可以发现各种设计模式，感叹设计的巧妙。

如此复杂的方法调用在这里被组织的井然有序，这是我们日常开发中需要好好学习的。

好了，这篇的内容到这里就差不多结束了，当然本文也并非讲解所有的源码，只是挑了核心原理的一部分来讲，至于其他方法，个人认为是一些其他方面的封装，这里我就不一一赘述了，感兴趣的同学可以自己去研读

【源码系列】Java中的数据结构——数组与ArrayList2