Add check for missing values to GroupByLabels#380
Add check for missing values to GroupByLabels#380sebhofer wants to merge 8 commits intofslaborg:masterfrom
Conversation
|
@sebhofer I checked Pandas' behavior on similar case. It basically drops rows of the group column that has missing value first and then group by it. I would prefer that instead of throwing out an error message. I'll dig a bit to see how to accomplish this efficiently. |
|
I'm not sure I agree. If you just drop some rows and move on quietly, the user might never notice that something is off. In an interactive environment, which is, I think, where Deedle is mostly used, I rather I get an error message, and I can fix the problem myself. |
|
I think both throwing values away and throwing an exception are sensible design choices - I would have slight preference for exception, because I feel the F# style is to be a bit more explicit than in the Python style (it just means people have to do some extra work to drop before grouping, but that seems OK). We might make that easier if we added |
|
@tpetricek I'm ok with throwing error if users got the message that some data are missing so that they can check data and clean it up. @sebhofer Could you submit another commit to throw a little bit more explicit message to suggest user to check missing data in grouping columns? |
|
@zyzhu Sure, will do. I can also try to come up with a PR |
|
I had a quick look at the grouping function. As |
|
@sebhofer That makes a lot of sense to me. Thanks! |
|
Hi, |
|
@kblohm No, that was not intended! I am new to forking projects and creating PRs so I really messed this up yesterday. Also, I didn't realize that everything I push to the master branch gets added to the PR. I'm not quite sure how to proceed with this. I'm guessing it's best to close this and open a new PR on a dedicated branch. Any input is welcome! |
|
You could clean this one up, but as far as i know you can not change the source branch. |
|
@sebhofer You could sync your fork and then create another branch. Then commit just the change of throwing exception e85244d and submit the pull. I'll merge it after that. BTW, @kblohm just changed the build using FAKE 5. After syncing with upstream, you need to do the following to get started.
|
|
You do not need to install fake globally, you can just run |
|
@kblohm Yeah, that seems like the cleanest solution. I think I'll be able to handle it... but you never know. I'll let you know otherwise. Thanks for your help! @zyzhu I'll try to do this tonight. On another note, did you see the issue I opened about working with the Deedle src? I'm struggling a bit with this and could use some advice. Or is this a trivial problem and I'm just missing something really obvious? |
|
@sebhofer I did see that. I am traveling for work this week. Will write up some drafts next week when I get back to office. But I will need some help from @tpetricek as he knows the original work more inside out than me as new comer too. Thanks to @kblohm, we are on FAKE 5 already. Just follow what he wrote above and you shall be able to build the source. As for how to work with fsx and source code. I will try to write up some sample cases next week. |
|
@zyzhu Perfect, thanks. I just wasn't sure if it was just me being stupid! |
|
@kblohm I might take you up on offer afterall :) How do I best clean this up? As I now have all these merge-commits in both |
|
Did you mange to fix this? git fetch upstream
git checkout master
git reset upstream/master --hardThis will reset all your changes, but it looks like you already have a new branch with the stuff you wanted to keep. |
|
I've just merged the new pull. |
Throw error when trying to group by column with missing values. This leads to unexpected results (see, e.g., this issue). This is not really a fix, but grouping on missing values seemsill-defined anyway.