Java Stream：第2部分，计数始终是计数吗？

在上一篇有关该主题的文章中，我们了解到JDK 8
stream()::count需要更长的时间来执行Stream更多的元素。对于较新的JDK（例如Java 11），简单流管道不再是这种情况。了解JDK本身如何进行改进。

Java 8

在上一篇文章中，我们可以得出结论：
list.stream().count()在Java 8下为O(N) ，即执行时间取决于原始列表中的元素数。阅读文章
在这里。

Java 9及更高版本

正如Nikolai Parlog（@nipafx）和Brian Goetz（@BrianGoetz）在Twitter上正确指出的那样，从Java 9开始改进了Stream::count的实现。下面是对底层代码的比较
Java 8和更高Java版本之间的Stream::count代码：

Java 8（来自ReferencePipeline类）

 return mapToLong(e -> 1L).sum();

Java 9及更高版本（来自ReduceOps类）

 if (StreamOpFlag.SIZED.isKnown(flags)) { return spliterator.getExactSizeIfKnown();  }

...

对于Java 9和更高版本的分离器，在Java 9和更高版本中似乎出现Stream::count为O(1) ，而不是O(N) 。让我们验证该假设。

基准测试

通过在Java 8和Java 11下运行以下JMH基准可以观察到big-O属性：

 @State (Scope.Benchmark)  public class CountBenchmark { private List<Integer> list; @Param ({ "1" , "1000" , "1000000" }) private int size; @Setup public void setup() { list = IntStream.range( 0 , size) .boxed() .collect(toList()); } @Benchmark public long listSize() { return list.size(); } @Benchmark public long listStreamCount() { return list.stream().count(); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(CountBenchmark. class .getSimpleName()) .mode(Mode.Throughput) .threads(Threads.MAX) .forks( 1 ) .warmupIterations( 5 ) .measurementIterations( 5 ) .build(); new Runner(opt).run(); }  }

这将在我的笔记本电脑（MacBook Pro 2015年中，2.2 GHz Intel Core i7）上产生以下输出：

JDK 8（来自上一篇文章）

 Benchmark                       (size)  Mode Cnt         Score          Error Units  CountBenchmark.listSize 1 thrpt 5 966658591.905 ± 175787129.100 ops/s  CountBenchmark.listSize 1000 thrpt 5 862173760.015 ± 293958267.033 ops/s  CountBenchmark.listSize 1000000 thrpt 5 879607621.737 ± 107212069.065 ops/s  CountBenchmark.listStreamCount 1 thrpt 5 39570790.720 ± 3590270.059 ops/s  CountBenchmark.listStreamCount 1000 thrpt 5 30383397.354 ± 10194137.917 ops/s  CountBenchmark.listStreamCount 1000000 thrpt 5 398.959 ± 170.737 ops/s

JDK 11

 Benchmark                                 (size)  Mode Cnt         Score          Error Units  CountBenchmark.listSize 1 thrpt 5 898916944.365 ± 235047181.830 ops/s  CountBenchmark.listSize 1000 thrpt 5 865080967.750 ± 203793349.257 ops/s  CountBenchmark.listSize 1000000 thrpt 5 935820818.641 ± 95756219.869 ops/s  CountBenchmark.listStreamCount 1 thrpt 5 95660206.302 ± 27337762.894 ops/s  CountBenchmark.listStreamCount 1000 thrpt 5 78899026.467 ± 26299885.209 ops/s  CountBenchmark.listStreamCount 1000000 thrpt 5 83223688.534 ± 16119403.504 ops/s

可以看出，在Java 11中， list.stream().count()操作现在是
O(1)而不是O(N) 。

Brian Goetz 指出，一些在Java 8下使用Stream::peek方法调用的开发人员发现，如果Stream::count终端操作在Java 9及更高版本下运行，则不再调用这些方法。这给JDK开发人员带来了一些负面反馈。就我个人而言，我认为这是JDK开发人员的正确决定，相反，这为
Stream::peek用户使他们的代码正确。

更复杂的流管道

在本章中，我们将介绍更复杂的流管道。

JDK 11

Tagir Valeev 得出结论，对于List::stream ，类似stream().skip(1).count()类的管道不是O(1) 。

通过运行以下基准可以观察到这一点：

 @Benchmark  public long listStreamSkipCount() { return list.stream().skip( 1 ).count();  }

 CountBenchmark.listStreamCount 1 thrpt 5 105546649.075 ± 10529832.319 ops/s  CountBenchmark.listStreamCount 1000 thrpt 5 81370237.291 ± 15566491.838 ops/s  CountBenchmark.listStreamCount 1000000 thrpt 5 75929699.395 ± 14784433.428 ops/s  CountBenchmark.listStreamSkipCount 1 thrpt 5 35809816.451 ± 12055461.025 ops/s  CountBenchmark.listStreamSkipCount 1000 thrpt 5 3098848.946 ± 339437.339 ops/s  CountBenchmark.listStreamSkipCount 1000000 thrpt 5 3646.513 ± 254.442 ops/s

因此， list.stream().skip(1).count()仍为O（N）。

加速

一些流实现实际上知道它们的源，并且可以采用适当的快捷方式并将流操作合并到流源本身中。这可以大大提高性能，尤其是对于具有更复杂的流管道（例如stream().skip(1).count()大型流stream().skip(1).count()

Speedment ORM工具允许将数据库视为Stream对象，并且这些流可以优化许多流操作，例如
Stream::count ， Stream::skip ， Stream::limit操作，如下面的基准所示。我已使用开源Sakila示例数据库作为数据输入。 Sakila数据库包含有关租赁电影，艺术家等的全部信息。

 @Benchmark  public long rentalsSkipCount() { return rentals.stream().skip( 1 ).count();  }  @Benchmark  public long filmsSkipCount() { return films.stream().skip( 1 ).count();  }

运行时，将产生以下输出：

 SpeedmentCountBenchmark.filmsSkipCount       N/A thrpt 5 68052838.621 ± 739171.008 ops/s  SpeedmentCountBenchmark.rentalsSkipCount     N/A thrpt 5 68224985.736 ± 2683811.510 ops/s

“租赁”表包含10,000行，而“电影”表仅包含1,000行。但是，它们的stream().skip(1).count()操作几乎同时完成。即使一个表包含一万亿行，它仍然会在相同的经过时间内对元素进行计数。因此， stream().skip(1).count()实现的复杂度为O(1)而不是O(N) 。

注意：上面的基准测试是通过“ DataStore” JVM中的内存加速来运行的。如果直接对数据库没有加速运行，则响应时间将取决于基础数据库执行嵌套“SELECT count(*) …”语句的能力。

摘要

在Java 9中Stream::count显着改善。

有些流实现（例如Speedment O(1)即使在更复杂的流管道（例如stream().skip(...).count()甚至stream.filter(...).skip(...).count() stream().skip(...).count() ，也可以在O(1)时间内计算Stream::count stream().skip(...).count() stream.filter(...).skip(...).count() 。