Data Overview (compression format: .tar.bz2, file format: .xml)
| No | Project | # of Reports | Size | Period |
|---|---|---|---|---|
| 1 | Eclipse | 528,862 | 387MB | 10/10/01 - 09/30/18 |
| 2 | Freedesktop | 106,065 | 121MB | 01/09/03 - 09/30/18 |
| 3 | GCC | 81,463 | 113MB | 08/03/99 - 09/30/18 |
| 4 | GNOME | 673,301 | 649MB | 02/05/99 - 09/30/18 |
| 5 | KDE | 388,711 | 399MB | 01/21/99 - 09/30/18 |
| 6 | LibreOffice | 62,029 | 63MB | 08/03/10 - 09/30/18 |
| 7 | Linux | 32,340 | 60MB | 11/06/02 - 09/30/18 |
| 8 | LLVM | 38,107 | 31MB | 10/07/03 - 09/30/18 |
| 9 | OpenOffice | 127,797 | 85MB | 10/16/00- 09/30/18 |
| Total | 2,038,675 | 1.86GB |
All bug reports are downloaded by a web crawler
You are kindly asked to acknowledge the usage of the dataset by citing the following two publications:
@inproceedings{xiao2020hindbr,
title={HINDBR: Heterogeneous Information Network Based Duplicate Bug Report Prediction},
author={Xiao, Guanping and Du, Xiaoting and Sui, Yulei and Yue, Tao},
booktitle={2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)},
pages={195--206},
year={2020},
organization={IEEE}
}
@article{du2021deepsim,
title={DeepSIM: Deep Semantic Information-Based Automatic Mandelbug Classification},
author={Du, Xiaoting and Zheng, Zheng and Xiao, Guanping and Zhou, Zenghui and Trivedi, Kishor S.},
journal={IEEE Transactions on Reliability},
volume={71},
number={4},
pages={1540-1554},
year={2022},
publisher={IEEE}
}