The One Billion Row Challenge(十亿行挑战)
2025-01-22 08:19:30 307 字
This post is also available in English and alternative languages.
Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!
The text file has a simple structure with one measurement value per row:
1 | Hamburg;12.0 |
The program should print out the min, mean, and max values per station, alphabetically ordered like so:
1 | {Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...} |
编写一个Java程序,从文本文件中获取温度测量值,并计算每个气象站的最小、平均和最大温度。只有一个小问题:文件有10亿行!
文本文件的结构很简单,每行一个测量值:
1 | Hamburg;12.0 |
程序应该按字母顺序打印出每个站点的最小、平均和最大值,如下所示:
1 | {Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...} |