The One Billion Row Challenge(十亿行挑战)
2025-01-22 08:19:30    307 字   
This post is also available in English and alternative languages.

The One Billion Row Challenge(十亿行挑战)


Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

The text file has a simple structure with one measurement value per row:

1
2
3
4
5
6
Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
...

The program should print out the min, mean, and max values per station, alphabetically ordered like so:

1
{Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...}
图例


编写一个Java程序,从文本文件中获取温度测量值,并计算每个气象站的最小、平均和最大温度。只有一个小问题:文件有10亿行!

文本文件的结构很简单,每行一个测量值:

1
2
3
4
5
6
Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
...

程序应该按字母顺序打印出每个站点的最小、平均和最大值,如下所示:

1
{Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...}