Skip to content

Commit ed61480

Browse files
Updated ROADMAP
1 parent f1f306e commit ed61480

File tree

1 file changed

+32
-83
lines changed

1 file changed

+32
-83
lines changed

ROADMAP.md

Lines changed: 32 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,15 @@
22

33
The goal of ML.NET is to democratize machine learning for .NET developers. This document outlines the current roadmap for the ML.NET framework and APIs.
44

5-
To see the plans for ML.NET tooling, check out the [Model Builder repo](https://github.com/dotnet/machinelearning-modelbuilder/issues/1707).
6-
7-
## Goals through November 2023
8-
9-
- Keep docs, samples, and repo up to date.
10-
- Deep learning
11-
- Make it easier to consume ONNX models in ML.NET
12-
- More NLP / CV scenarios powered by TorchSharp
13-
- Accelerate deep learning workflows through improved batch support and accelerators (i.e. ONNX Execution Providers)
14-
- Build a bridge between ONNX and TorchSharp
15-
- LightGBM
16-
- Update to the latest version
17-
- Enable collaboration by enabling loading / saving of models in native LightGBM format.
18-
- DataFrame
19-
- Bug fixes and getting the basics right
20-
- Untyped PredictionEngine
21-
- Responsible AI
22-
- Integrate fairness detection and mitigation APIs into ML.NET
5+
To see the plans for ML.NET tooling, check out the [Model Builder repo](https://github.com/dotnet/machinelearning-modelbuilder/issues?q=is%3Aissue+is%3Aopen+label%3AEpic).
236

247
## Feedback and contributions
258

269
ML.NET is a community effort and we welcome community feedback on our plans. The best way to give feedback is to [open an issue](https://github.com/dotnet/machinelearning/issues/new/choose) in this repo.
2710

2811
We also invite contributions. The [first good issue](https://github.com/dotnet/machinelearning/labels/good%20first%20issue) and [up-for-grabs issues](https://github.com/dotnet/machinelearning/issues?q=is%3Aopen+is%3Aissue+label%3Aup-for-grabs) on GitHub are a good place to start. You can also help work on any of the features we've listed below or work on features that you want to add to the framework.
2912

30-
## Goals through June 2022
13+
## Goals through November 2023
3114

3215
The following sections outline the major areas and features we plan to work on in the next year.
3316

@@ -39,110 +22,76 @@ As we prioritize, cost, and continue planning, we will try to keep the Roadmap u
3922

4023
We heard your feedback loud and clear that our outdated docs and samples were a top pain point when learning and using ML.NET.
4124

42-
We have invested more resources into content development to make sure our Docs stay relevant and that we add documentation for new features faster as well as add more relevant samples.
25+
As we continue to drive improvements in ML.NET and add new features, it's important to us that you're successful in adopting and using these enhanced capabilities to deliver value. Documentation and samples are an key part of that. Over the next year we plan to dedicate more resoures to deliver quality documentation and samples.
26+
27+
This [tracking issue](https://github.com/dotnet/docs/issues/32112) lists a few of the areas we plan to build documentation around over the next few months,
4328

4429
You can file issues and make suggestions for ML.NET documentation in the [dotnet/docs repo](https://github.com/dotnet/docs) and for ML.NET samples in the [dotnet/machinelearning-samples](https://github.com/dotnet/machinelearning-samples) repo.
4530

4631
We are also taking steps to organize the [dotnet/machinelearning](https://github.com/dotnet/machinelearning) repo and updating our triage processes so that we can address your issues and feedback faster. Issues will be linked to version releases in the [Projects](https://github.com/dotnet/machinelearning/projects) section of the repo so you can see what we're actively working on and when we plan to release.
4732

48-
### Get on the .NET release schedule
49-
50-
ML.NET is .NET, and to make it feel more a part of .NET, we've decided to align with the .NET release schedule.
51-
52-
This means that we will ship our next version of ML.NET (v1.7.0) with .NET 6.0 in November 2021.
53-
54-
While we'll have major releases of ML.NET once a year with the major .NET releases, we will maintain release branches to optionally service ML.NET with bug fixes and/or minor features on the same cadence as .NET servicing.
55-
5633
### Deep learning
5734

5835
This past year we've been working on our plan for deep learning in .NET, and now we are ready to execute that plan to expand ML.NET's deep learning support.
5936

6037
As part of this plan, we will:
6138

6239
1. Make it easier to consume ONNX models in ML.NET using the ONNX Runtime (RT)
63-
2. Fully support and productionize [TorchSharp](https://github.com/xamarin/TorchSharp) for building neural networks in .NET
64-
3. Build a bridge between TorchSharp and ML.NET
40+
1. Continue to bring more scenario-based APIs backed by TorchSharp transformer-based architectures. The next few scenarios we're looking to enable are:
41+
- Object detection
42+
- Named Entity Recognition (NER)
43+
- Question Answering
44+
1. Enable integrations with TorchSharp for scenarios and models not supported out of the box by ML.NET.
45+
1. Accelerate deep learning workflows by improving batch support and enabling easier use of accelerators such as ONNX Runtime Execution Providers.
6546

6647
Read more about the deep learning plan and leave your feedback in this [tracking issue](https://github.com/dotnet/machinelearning/issues/5918).
6748

68-
### Move from System.Drawing to ImageSharp
49+
Performance-related improvements are being tracked in this [issue](https://github.com/dotnet/machinelearning/issues/6422).
6950

70-
Starting in .NET 6, System.Drawing.Common will only be supported on Windows (you can read more about this decision in this [design doc](https://github.com/dotnet/designs/blob/main/accepted/2021/system-drawing-win-only/system-drawing-win-only.md)).
51+
### LightBGM
7152

72-
To ensure ML.NET works great on all platforms, we will replace System.Drawing with the [ImageSharp](https://github.com/SixLabors/ImageSharp) graphics library.
53+
LightGBM is a flexible framework for classical machine learning tasks such as classification and regression. To make the best of the features LightGBM provides, we plan to:
7354

74-
*Related issues*:
55+
- Upgrade the version included in ML.NET to the latest LightGBM version
56+
- Make interoperability with other frameworks easier by enabling saving and loading models in the native LightGBM format.
7557

76-
- [#3154](https://github.com/dotnet/machinelearning/issues/3154)
58+
We're tracking feedback and progress on LightGBM in this [issue](https://github.com/dotnet/machinelearning/issues/6337).
7759

78-
### New features and scenarios
79-
80-
#### Named Entity Recognition (NER)
60+
### Define the plan for data prep
8161

82-
Named Entity Recognition, or NER, is the process of identifying and classifying/tagging information in text. For example, an NER model might look at a block of text and pick out "Seattle" and "Space Needle" and categorize them as locations or might find and tag "Microsoft" as a company.
62+
While we are working on developing the features mentioned above, we will also be working on our plan for data preparation and wrangling in .NET.
8363

84-
Currently you can consume a pre-trained ONNX model in ML.NET for NER, but it is not possible to train a custom NER model in ML.NET which has been a highly requested feature for several years.
64+
#### DataFrame API
8565

86-
This year, we will work on adding support for training custom NER models in ML.NET.
66+
Data processing is an important part of any analytics and machine learning workflow. This process often involves loading, inspecting, transforming, and visualizing your data. We've heard your feedback that one of the ways you'd like to perform some of these tasks is by using the DataFrame API in the `Microsoft.Data.Analysis` NuGet package. This past year we worked on making the loading experience more robust and adding new column types like DateTime to enable better interoperability with the ML.NET `IDataView`. In the next year, we plan to continue focusing on the areas of:
8767

88-
*Related issues*:
68+
- Improving interoperability with the `IDataView` by supporting `VBuffer` and `KeyType` columns.
69+
- Improving stability for common operations such as loading, filtering, merging, and joining data.
8970

90-
- [#630](https://github.com/dotnet/machinelearning/issues/630)
71+
This [tracking issue](https://github.com/dotnet/machinelearning/issues/6144) is intended to collect feedback and track progress of the work we're doing on the DataFrame.
9172

92-
#### Dynamic IDataView
73+
#### Untyped / Dynamic training and prediction engine
9374

9475
In ML.NET, you must first define your model input and output schemas as new classes before loading data into an IDataView.
9576

96-
This year, we will work on adding a way to create dynamic IDataViews, meaning that you don't have to define your schemas beforehand and instead the shape of the training data defines the schemas.
97-
98-
*Related issues*:
99-
100-
- [#5895](https://github.com/dotnet/machinelearning/issues/5895)
77+
In ML.NET 2.0 we made progress in this area by leveraging the `InferColumns` method as a source of information for the AutoML `Featurizer`. The `Featurizer` helps automate common data preprocessing tasks to get your data in a state that's ready for training. When used together, you don't have to define schema classes. This is convenient when working with large datasets.
10178

102-
#### Multivariate time series forecasting
79+
Similarly, using the DataFrame API, you can load data into the `DataFrame`, apply any transformations to your data, use the data as input to an ML.NET pipeline, train a model, and use the model to make predictions. At that point, you can call `ToDataFrame` and convert your predictions to a DataFrame making it easier to post-process and visualize those predictions. As mentioned in the `DataFrame` section, there's still some work that needs to be done to make the experience of going between a `DataFrame` and `IDataView` seamless but the `DataFrame` is another option for working with ML.NET without having to define schemas.
10380

104-
Currently ML.NET only supports univariate time series forecasting with the [SSA algorithm](https://docs.microsoft.com/dotnet/api/microsoft.ml.transforms.timeseries.ssaforecastingestimator?view=ml-dotnet) which is currently being [added to Model Builder](https://github.com/dotnet/machinelearning-modelbuilder/issues/1750).
81+
However, for single predictions, there are currently no solutions. For tasks like forecasting, we've made modifications to how the PredictionEngine behaves. As a result, we expect being able to do something similar to enable untyped prediction engines.
10582

106-
Univariate time series has one time-dependent variable whose values only depend on its past values through time. Multivariate time series has more than one time-dependent variable where each variable depends on its past values as well as the other variables.
107-
108-
This year, we will work on adding support for multivariate time series forecasting to ML.NET.
83+
While the details of what that implementation looks like, this year we plan to provide ways to create dynamic prediciton engines, meaning that you don't have to define your schemas beforehand and instead the shape of the training data defines the schemas.
10984

11085
*Related issues*:
11186

112-
- [#5638](https://github.com/dotnet/machinelearning/issues/5638)
113-
- [#1696](https://github.com/dotnet/machinelearning/issues/1696)
114-
115-
#### Multilabel Classification
116-
117-
Currently, ML.NET's classification algorithms will return one Predicted Label as well as an array of Scores which correspond to each possible class. However, mapping each label to the Score is currently not a great experience.
118-
119-
This year we will work on making the prediction info more user-friendly so that it is easy to assign multiple classes to one prediction.
120-
121-
*Related issues*:
122-
123-
- [#3909](https://github.com/dotnet/machinelearning/issues/3909)
124-
- [#2278](https://github.com/dotnet/machinelearning/issues/2278)
87+
- [#5895](https://github.com/dotnet/machinelearning/issues/5895)
12588

12689
### Model explainability & Responsible AI
12790

12891
Model Explainability and Responsible AI are becoming increasingly important areas of focus in the Machine Learning space and at Microsoft. Model explainability and fairness features are important because they let you debug and improve your models and answer questions about bias, building trust, and complying with regulations.
12992

13093
ML.NET currently offers two main model explainability features: [Permutation Feature Importance](https://docs.microsoft.com/dotnet/api/microsoft.ml.permutationfeatureimportanceextensions?view=ml-dotnet) (PFI) and the [Feature Contribution Calculator](https://docs.microsoft.com/dotnet/api/microsoft.ml.transforms.featurecontributioncalculatingestimator?view=ml-dotnet) (FCC).
13194

132-
We got a lot of feedback that the PFI API was difficult to use, so our first step is to improve the current experience in ML.NET. These improvements can be tracked in this [issue](https://github.com/dotnet/machinelearning/issues/5625) which will be merged soon.
133-
134-
This year we also plan to expand the number of model explainability and fairness features. We are currently working on this plan and will update the roadmap as we finalize which model explainability and fairness techniques we will bring into ML.NET.
135-
136-
### Define the plan for data prep
137-
138-
While we are working on developing the features mentioned above, we will also be working on our plan for data preparation and wrangling in .NET.
139-
140-
#### DataFrame API
141-
142-
The plan for data prep will include the roadmap for the DataFrame API (Microsoft.Data.Analysis) which we will add and update to this Roadmap doc.
143-
144-
*Related issues*:
95+
In ML.NET 2.0 we made improvements to simpify the PFI API.
14596

146-
- [#5870](https://github.com/dotnet/machinelearning/issues/5870)
147-
- [#5716](https://github.com/dotnet/machinelearning/issues/5716)
148-
- [#1696](https://github.com/dotnet/machinelearning/issues/1696)
97+
We also worked on porting fairness assessment and mitigation components from the [Fairlearn library to .NET](https://github.com/dotnet/machinelearning/pull/6279). Those components are not integrated into ML.NET yet. This year we plan on exposing them into ML.NET.

0 commit comments

Comments
 (0)