Pixel Classification#
- class detectree.ClassifierTrainer(*, sigmas=None, num_orientations=None, neighborhood=None, min_neighborhood_range=None, num_neighborhoods=None, tree_val=None, nontree_val=None, classifier_class=None, **classifier_kwargs)[source]#
Train binary tree/non-tree classifier(s) of the pixel features.
- __init__(*, sigmas=None, num_orientations=None, neighborhood=None, min_neighborhood_range=None, num_neighborhoods=None, tree_val=None, nontree_val=None, classifier_class=None, **classifier_kwargs)[source]#
Initialize the classifier.
See the background example notebook for details.
- Parameters:
sigmas (list-like, optional) – The list of scale parameters (sigmas) to build the Gaussian filter bank that will be used to compute the pixel-level features. The provided argument will be passed to the initialization method of the PixelFeaturesBuilder class. If no value is provided, the value set in settings.GAUSS_SIGMAS will be taken.
num_orientations (int, optional) – The number of equally-distributed orientations to build the Gaussian filter bank that will be used to compute the pixel-level features. The provided argument will be passed to the initialization method of the PixelFeaturesBuilder class. If no value is provided, the value set in settings.GAUSS_NUM_ORIENTATIONS is used.
neighborhood (array-like, optional) – The base neighborhood structure that will be used to compute the entropy features. Theprovided argument will be passed to the initialization method of the PixelFeaturesBuilder class. If no value is provided, a square with a side size of 2 * min_neighborhood_range + 1 is used.
min_neighborhood_range (int, optional) – The range (i.e., the square radius) of the smallest neighborhood window that will be used to compute the entropy features. The provided argument will be passed to the initialization method of the PixelFeaturesBuilder class. If no value is provided, the value set in settings.ENTROPY_MIN_NEIGHBORHOOD_RANGE is used.
num_neighborhoods (int, optional) – The number of neighborhood windows (whose size follows a geometric progression starting at min_neighborhood_range) that will be used to compute the entropy features. The provided argument will be passed to the initialization method of the PixelFeaturesBuilder class. If no value is provided, the value set in settings.ENTROPY_NUM_NEIGHBORHOODS is used.
tree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the response images. The provided arguments will be passed to the initialization method of the PixelResponseBuilder class. If no values are provided, the values set in settings.TREE_VAL and settings.NON_TREE_VAL are respectively used.
nontree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the response images. The provided arguments will be passed to the initialization method of the PixelResponseBuilder class. If no values are provided, the values set in settings.TREE_VAL and settings.NON_TREE_VAL are respectively used.
classifier_class (class, optional) – The class of the classifier to be trained. It can be any scikit-learn compatible estimator that implements the fit, predict and predict_proba methods and that can be saved to and loaded from memory using skops. If no value is provided, the value set in settings.CLF_CLASS is used.
classifier_kwargs (key-value pairings, optional) – Keyword arguments that will be passed to the initialization of classifier_class. If no value is provided, the value set in settings.CLF_KWARGS is used.
- train_classifier(*, split_df=None, img_dir=None, response_img_dir=None, img_filepaths=None, response_img_filepaths=None, img_filename_pattern=None, method=None, img_cluster=None)[source]#
Train a classifier.
See the background example notebook for more details.
- Parameters:
split_df (pandas DataFrame, optional) – Data frame with the train/test split.
img_dir (str representing path to a directory, optional) – Path to the directory where the images from split_df or whose filename matches img_filename_pattern are located. Required if split_df is provided. Ignored if img_filepaths is provided.
response_img_dir (str representing path to a directory, optional) – Path to the directory where the response tiles are located. Required if providing split_df. Otherwise response_img_dir might either be ignored if providing response_img_filepaths, or be used as the directory where the images whose filename matches img_filename_pattern are to be located.
img_filepaths (list-like, optional) – List of paths to the input tiles whose features will be used to train the classifier. Ignored if split_df is provided.
response_img_filepaths (list-like, optional) – List of paths to the binary response tiles that will be used to train the classifier. Ignored if split_df is provided.
img_filename_pattern (str representing a file-name pattern, optional) – Filename pattern to be matched in order to obtain the list of images. If no value is provided, the value set in settings.IMG_FILENAME_PATTERN is used. Ignored if split_df or img_filepaths is provided.
method ({'cluster-I', 'cluster-II'}, optional) – Method used in the train/test split.
img_cluster (int, optional) – The label of the cluster of tiles. Only used if method is ‘cluster-II’.
- Returns:
clf – The trained classifier.
- Return type:
scikit-learn-like classifier
- train_classifiers(split_df, img_dir, response_img_dir)[source]#
Train a classifier for each first-level cluster in split_df.
See the background example notebook for more details.
- Parameters:
split_df (pandas DataFrame) – Data frame with the train/test split, which must have an img_cluster. column with the first-level cluster labels.
img_dir (str representing path to a directory) – Path to the directory where the images from split_df or whose filename matches img_filename_pattern are located. Required if split_df is provided. Ignored if img_filepaths is provided.
response_img_dir (str representing path to a directory) – Path to the directory where the response tiles are located.
- Returns:
clf_dict – Dictionary mapping a scikit-learn-like classifier to each first-level cluster label.
- Return type:
dictionary
- class detectree.Classifier(*, clf=None, clf_dict=None, hf_hub_repo_id=None, hf_hub_clf_filename=None, hf_hub_download_kwargs=None, skops_trusted=None, tree_val=None, nontree_val=None, refine_method=None, refine_kwargs=None, return_proba=None, **pixel_features_builder_kwargs)[source]#
Use trained classifier(s) to predict tree pixels.
- __init__(*, clf=None, clf_dict=None, hf_hub_repo_id=None, hf_hub_clf_filename=None, hf_hub_download_kwargs=None, skops_trusted=None, tree_val=None, nontree_val=None, refine_method=None, refine_kwargs=None, return_proba=None, **pixel_features_builder_kwargs)[source]#
Initialize the classifier instance.
See the background example notebook for more details.
- Parameters:
clf (scikit-learn-like classifier, optional) – Trained classifier. If no value is provided, the latest detectree pre-trained classifier is used. Ignored if clf_dict is provided.
clf_dict (dictionary, optional) – Dictionary mapping a trained scikit-learn-like classifier to each first-level cluster label.
hf_hub_repo_id (str, optional) – HuggingFace Hub repository id (string with the user or organization and repository name separated by a /) and file name of the skops classifier respectively. If no value is provided, the values set in settings.HF_HUB_REPO_ID and settings.HF_HUB_CLF_FILENAME Ignored if clf or clf_dict are provided.
hf_hub_clf_filename (str, optional) – HuggingFace Hub repository id (string with the user or organization and repository name separated by a /) and file name of the skops classifier respectively. If no value is provided, the values set in settings.HF_HUB_REPO_ID and settings.HF_HUB_CLF_FILENAME Ignored if clf or clf_dict are provided.
hf_hub_download_kwargs (dict, optional) – Additional keyword arguments (besides “repo_id”, “filename”, “library_name” and “library_version”) to pass to huggingface_hub.hf_hub_download.
skops_trusted (list, optional) – List of trusted object types to load the classifier from HuggingFace Hub, passed to skops.io.load. If no value is provided, the value from settings.SKOPS_TRUSTED is used. Ignored if clf or clf_dict are provided.
tree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the response images. If no values are provided, the values set in settings.TREE_VAL and settings.NON_TREE_VAL are respectively used.
nontree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the response images. If no values are provided, the values set in settings.TREE_VAL and settings.NON_TREE_VAL are respectively used.
refine_method (callable or bool, optional) – Method to refine the pixel-level classification, e.g., to optimize the consistence between neighboring pixels. If False is provided, no refinement is performed. If None is provided and return_proba is None or False, the value from settings.CLF_REFINE is used.
refine_method_kwargs (dict, optional) – Keyword arguments that will be passed to the refine_method. If no value is provided, the value set in settings.CLF_REFINE_KWARGS is used. Ignored if no refinement is performed (i.e., refine_method is False or refine_method is None and return_proba is True).
return_proba (bool, optional) – If True, the classifier will return the probabilities of each pixel belonging to the tree class. If False, the classifier will return the predicted class labels. Ignored if a valid refine_method is provided.
pixel_features_builder_kwargs (dict, optional) – Keyword arguments that will be passed to detectree.PixelFeaturesBuilder, which customize how the pixel features are built.
- detectree.maxflow_refine(p_tree_img, tree_val, nontree_val, *, refine_int_rescale=10000, refine_beta=50)[source]#
Refine the pixel-level classification using a graph max-flow algorithm.
- Parameters:
p_tree_img (numpy.ndarray) – The probability image of the pixel being a tree, as a two-dimensional numpy array with values between 0 and 1.
tree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the output array.
nontree_val (int, optional) – The values that designate tree and non-tree pixels respectively in the output array.
refine_int_rescale (int, optional) – Parameter of the refinement procedure that controls the precision of the transformation of float to integer edge weights, required for the employed graph cuts algorithm. Larger values lead to greater precision.
refine_beta (int, optional) – Parameter of the refinement procedure that controls the smoothness of the labelling. Larger values lead to smoother shapes.
- Returns:
img – The refined pixel-level classification as a two-dimensional numpy array with the same shape as p_tree_img.
- Return type:
numpy.ndarray