-
Notifications
You must be signed in to change notification settings - Fork 94
Excluded columns overwritten due to variable reuse in pca_preprocess #389
Description
I think I encountered a bug in the pca_preprocess function where excluded columns (typically categorical variables) are not properly added back to the rotated_experiments DataFrame. This happens because the wrong variable is used when re-inserting the excluded columns.
In the current code, this section appears:
for entry in exclude:
rotated_experiments[name] = experiments[entry]
Here, the variable name is reused from a previous loop, where it was used to assign rotated PCA column names. As a result, by the time this loop runs, name refers to the last PCA column name, not the current excluded column. This causes all excluded columns to overwrite each other, and only the last one remains.
Suggested fix:
The loop should instead use the entry as the column name when reinserting:
for entry in exclude:
rotated_experiments[entry] = experiments[entry]
This will correctly reinsert each excluded (categorical) variable using its original column name.