Properly normalize column names in Utils.GetSampleData() for duplicate cases#5280
Merged
mstfbl merged 1 commit intodotnet:masterfrom Jul 3, 2020
Merged
Properly normalize column names in Utils.GetSampleData() for duplicate cases#5280mstfbl merged 1 commit intodotnet:masterfrom
mstfbl merged 1 commit intodotnet:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5280 +/- ##
==========================================
- Coverage 73.68% 73.68% -0.01%
==========================================
Files 1022 1022
Lines 190258 190320 +62
Branches 20468 20470 +2
==========================================
+ Hits 140185 140230 +45
- Misses 44541 44559 +18
+ Partials 5532 5531 -1
|
LittleLittleCloud
approved these changes
Jul 3, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #5267
This PR fixes the bug where columns generated from inline data were normalized directly through
Utils.Normalize(), which only fixes the naming of a given column name, but does not take into account duplicate column names that may exist in a dataset.PR #5177 introduced a way to fix these duplicate column names by adding the differentiator suffix '_col_x' where 'x' represents the the dataset load order for a given column. In this PR I have separated this generation of distinct and unique column names from
Utils.GenerateClassLabels()and made it into its own function toUtils.GenerateColumnNames(). This is so that this generation of distinct and unique column names can also be used inUtils.GenerateSampleData, which before this PR resulted in exceptions. Now, column names from inline data are properly normalized, and duplicate column names are handled.This PR also adds a unit test to test the case of duplicate column names with
Utils.GenerateSampleData.