Hangyu Zhou, Chia-Hsiang Kao, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala
AllClear is a comprehensive dataset/benchmark for cloud detection and removal.
Notice: We are actively cleaning up the codebase and uploading our dataset for public access. Stay tuned!
Please navigate to the root directory of this project and run the following commands:
# Clone the repository
git clone https://github.com/Zhou-Hangyu/allclear.git
# Obtain the baselines
cd allclear
git submodule update --init --recursive
# Download the dataset and metadata json file.
python download.pyThis section provides instructions on how to use the benchmark with the UnCRtainTS model as an example.
-
First, set up the environment for
UnCRtainTS. Visit the UnCRtainTS GitHub page and follow the instructions there to create their conda environment. -
After setting up the
UnCRtainTSenvironment, navigate to the root directory of this project and install our package using pip:pip install -e . -
To run the benchmark and see some results, execute the
run_benchmark.shscript located in thedemosdirectory:# Run the Least Cloudy baseline bash demos/run_leastcloud.sh # Run the pretrained UnCRtainTS bash demos/run_uncrtaints_pretrained.sh # Run the UnCRtainTS pretrained on our full allclear dataset bash demos/run_uncrtaints_allclear100pc.sh
- We use Cloud Optimized GeoTIFF (COG) format to store all our GeoTIFF files.
- The raw data comes with gaps (NaN values) around the boundaries due to map projection. We currently center-crop the images on the fly to get rid of the gaps. We are working on post-processing the entire dataset to crop them for good.
You can download the dataset by running python download.py in the root directory of this project. This script will download the entire AllClear dataset, along with the metadata files.
The metadata files are grouped into three folders: data, datasets, and rois.
datacontains metadata for the raw satellite images, grouped by satellite sensor.datasetscontains metadata for the datasets we created. The naming convention is{dataset_split}_{input_sequence_length}_{sensor_list}_{percentage_of_data_used}(_{1proi}).json.dataset_splitcan betrain,val, ortest.input_sequence_lengthis the number of frames in the input sequence.tx3means 3-frame input,tx12means 12-frame input.sensor_listis a list of satellite sensors used in the dataset.s2-s1means S2 and S1,s2-s1-landsatmeans S2, S1, and Landsat8/9.percentage_of_data_usedis the percentage of data used to create the dataset, e.g.100pctmeans 100% of the data is used.1proiindicates that the dataset is created by randomly sampling 1 sample in each ROI. We use this to create a lightweight test set.
roiscontains metadata for the regions of interest (ROIs) we created.
Here is the list of metadata files:
.
├── data
│ ├── dw_metadata.csv
│ ├── landsat8_metadata.csv
│ ├── landsat9_metadata.csv
│ ├── s1_metadata.csv
│ └── s2_metadata.csv
├── datasets
│ ├── test_tx3_s2-s1_100pct_1proi.json
│ ├── test_tx3_s2-s1_100pct.json
│ ├── test_tx3_s2-s1-landsat_100pct_1proi.json
│ ├── test_tx3_s2-s1-landsat_100pct.json
│ ├── train_tx12_s2-s1_100pct.json
│ ├── train_tx12_s2-s1_3.4pct.json
│ ├── train_tx3_s2-s1_100pct.json
│ ├── train_tx3_s2-s1_10pct.json
│ ├── train_tx3_s2-s1_1pct.json
│ ├── train_tx3_s2-s1_3.4pct.json
│ ├── train_tx3_s2-s1-landsat_100pct.json
│ ├── train_tx3_s2-s1-landsat_3.4pct.json
│ ├── val_tx12_s2-s1_100pct.json
│ └── val_tx3_s2-s1-landsat_100pct.json
├── rois
│ ├── rois_metadata.csv
│ ├── test_rois_3k.txt
│ ├── train_rois_19k.txt
│ └── val_rois_1k.txt
This project is licensed under the MIT License.
- The main package folder is
allclear. Should only contain reusable code directly related to the use of the dataset and benchmark. - Every baseline we proposed or reproduced should have one folder in the
/baselinesfolder.- They will have a wrapper in
allclear/baselines.pywith uniform input/output format for easy comparison.
- They will have a wrapper in
- The
demofolder contains minimal code to demonstrate the use of the dataset and benchmark. - For all other code, please put them in the
/experimental_scriptsfolder for now.