Plugged in Lin. Reg. and PCA from Deedle.Extras#496
Conversation
|
That's awesome. Could you add some use cases as unit test? Load some csv files similar to other test cases and cover your functions. |
|
Sure, that makes a lot of sense. Will probably get around to do it sometime this week. |
|
Status update: Just added a few tests for the linear regression logic, that verifies that the coefficients matches with what is done in Math.net, will get around to PCA soon. |
… sorted by largest eigenvalues first, and added a unit test for PCA logic.
|
Added another unit tests for PCA now. Reckon this will suffice? |
|
In principle, this already looks pretty good. Thanks a lot for contributing your extensions directly here! As for design of the regression function and result, I'd like to follow the experience of The thing that bugs me is We could change We shall also add more extension functions. For instance, a F-Stats can also be added later. Let me know what you think. We can address it in this pull, or I can merge it and then work it out in another pull. |
|
Thank you very much for your comments. Regarding design being close to lm in R, I agree very much, this was also the experience I tried to recreate, when I originally coded this. Regarding FitIntercept, I have had similar thoughts as you. My choice to go for the option was:
Another design choice could be sub-modules, so rather than doing So here we are essentially trading off the option type for overloads. However I don’t know much I like that approach. A third solution could be that:
Regarding summary, then certainly it needs a bit of improvement and prettifications, including the F-statistic. I’d say lets agree on the interface for LinearRegression and I can fix it in this pull request, and then leave summary improvements and F-statistics for later ones. Which one of my the design proposals do you think is the most suitable, or do you have another idea? |
|
I've tried a use case of let df = [1, stockReturns?MSFT; 2, stockReturns?WMT; 3, stockReturns?AES] |> frame
LinearRegression.multiDim [1; 2] 3 (Some 0) df
|> LinearRegression.Fit.coefficientsOutput is like following In the above case, I have to name the intercept a number, zero in this case. Though calculation is correct, the use case looks very bizarre to me. I cannot imagine stats user would ever run a regression without naming their variable for interpretation. As you confirmed, you have never encountered a case like that either. Your solution 3 can solve it. But I feel leaving another layer of Another idea for your consideration. LinearRegression.multiDim ["MSFT"] "AES" (Some "yIntersect") stockReturns
LinearRegression.simple "MSFT" "AES" (Some "yIntersect") stockReturnsWhy don't we just name |
|
Changed design to this,
|
|
Looks great to me. Feel free to send other pulls to improve the output. I'll try to fix another issue and then bump up the version to release it. |
|
I've released 2.2.0 on nuget that includes this feature. Thanks again! |
Added the linear regression stuff from my own package for PCA and Lin. Reg. Let me know if we need to change/update/fix some stuff, (style, any bugs I have overlooked, etc.) before merging in completely.