-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame enhancements #6088
Comments
Thanks for raising this issue. We're planning on evaluating the data preparation / data wrangling story in the coming months as outlined in the roadmap. We suspect the DataFrame API has a role to play there but until we have a clearer picture on common uses, asks, and pain points with the existing API, there is no active development on the DataFrame API at this time. That doesn't mean the project is dead or issues and feature requests like these aren't being taken into account. They are going to help frame our investigations and prioritize our efforts. Because the DataFrame API is currently in preview and we don't expect to add new features within the next couple of months, personally I would not take hard dependencies on it at this time for critical systems. Let us know if you have additional questions or issues. |
Thanks, Luis! |
Luis, |
Tagging for visibility: @GKrivosheev-rms Thanks Gleb for providing additional context around your scenario. To clarify, you're looking to use DataFrame for data processing and analytics, not exactly for building predictive analytics / machine learning models? If so, have you taken a look at .NET for Apache Spark? It has it's own implementation of DataFrames which support:
Not sure if that would help solve your problem, but thought I'd mention it. Here's an E2E example of .NET for Apache Spark and ML.NET as well as standalone examples from the .NET for Apache Spark repo. |
Thanks for suggestion, @luisquintanilla . I'll take a look. Few questions:
Regards, |
@GKrivosheev-rms great questions. I've tried to answer them below.
Hope this helps. Happy to clarify anything. |
To add here, Parquet.Net which is already used in ML.NET has full built-in support for DataFrame read and write. There is a sample C# interactive notebook demonstrating basic use (it's a one-liner) as well. It just works. |
I see dozens of issues and enhancement suggestions for DataFrame in Microsoft.Data.Analysis namespace untouched for almost a year.
Are there any resources allocated to address those?
Is the project dead?
Are there any plans to fund the work on those features in the future?
Should we base any future development on these?
Specific enhancements desired:
The text was updated successfully, but these errors were encountered: