Data frame binary operations #2656

pgovind · 2019-04-09T00:07:38Z

This is work towards #26854 to add a DataFrame type to .NET Core

This patch implements the basic infrastructure, indexing and binary operations on a dataframe. I thought this was a good spot to get an initial review. More patches will follow on top of this to support computations, aggregations, joins, merges, group by etc

@eerhardt @stephentoub @ericstj

dnfclas · 2019-04-09T00:07:50Z

All CLA requirements met.

YohDeadfall · 2019-04-09T04:25:52Z

Could you reference an issue or describe the goal?

/cc @roji

YohDeadfall · 2019-04-09T13:07:49Z

src/Microsoft.Data/BaseDataFrameColumn.cs

+        public string Name;
+
+        public virtual object this[long rowIndex] { get { throw new NotImplementedException(); } set { throw new NotImplementedException(); } }
+        public virtual object this[long startIndex, int length] { get { throw new NotImplementedException(); } }


This is how current API of System.Data is designed. It uses boxing in many places. I think that this methods should be removed from the base class and be implemented in DataFrameColumn<T>.

They are overridden in DataFrameColumn at the moment?. Do you mean they should occur only in DataFrameColumn and not in the base class? I'm not completely convinced about that yet. At the moment, the DataFrame class has a DataFrameTable which has an IList called columns i.e. the DataFrameTable does not know the real type of the columns it holds => therefore, the approach at the moment is to add APIs on BaseDataFrameColumn and override them in the derived columns. It is possible that this will change down the line as I add more features, but for the moment I think this looks reasonable. Thoughts?

Okay, let's change it a bit later.

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

YohDeadfall · 2019-04-09T13:11:22Z

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

+            for (int ii = 0; ii < NumColumns; ii++)
+            {
+                DataFrameColumn<T> column = _table.Column(ii) as DataFrameColumn<T>;
+                if (column != null)


Invert condition and use continue.

I'm not opposed to the style but I do have some questions here for the larger crowd:

If I had a DataFrameColumn<int> and called DataFrameColumn<int>.Add(5.0),
The lines
DataFrameColumn<T> column = _table.Column(ii) as DataFrameColumn<T>; if (column != null)

mean that nothing will happen since the dynamic cast will fail. I'm leaning towards checking the type of the resulting operation and creating a new column with that type?
For ex: DataFrameColumn.Add(5.0) will result in a new DataFrameColumn, similar to what 1+5.0 does in C#.

For cases such as DataFrameColumn<int> - string, we just don't perform the operation instead of throwing?
Thoughts?

I would expect it to throw if I try an operation that isn't supported, but why wouldn't that one follow the same rule as number + string with scalars? i.e. convert to string

Ugh, my bad. I typed in a hurry. I meant to write "DataFrameColumn - string"(Basically whenever we have incompatible operands). Edited it now.

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

YohDeadfall · 2019-04-09T13:15:42Z

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

+
+<# foreach (MethodConfiguration method in methodConfiguration) { #>
+<# if (method.MethodType == MethodType.BinaryScalar || method.MethodType == MethodType.ComparisonScalar) {#>
+        public DataFrame <#=method.MethodName#><T>(T value)


Bad indentation here and above. It should be multiple of 4.

Right, I don't think T4TT engine does a great job with the indentations. I did try that before deciding to stay with this scheme

YohDeadfall · 2019-04-09T13:16:19Z

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

+<# } else { #>
+        public DataFrame <#=method.MethodName#><T>(IList<T> values)
+<# } #>
+            where T : struct


Why is T limited to value types only?

Because we want this to be a DataFrame of primitive types. Since there's no way at the moment to specify primitive types alone, I put in value types. A DataFrame is used for data wrangling with primitive types, similar to Pandas. Is there a way to add reference types to Pandas/DataFrame that makes sense? For ex, what does it mean to say DataFrameColumn.Max() or DataFrameColumn.Add(5)?

you should be able to store strings in a dataframe column.

I agree. At the moment, my plan is to store strings as bytes so I can still keep the struct constraint

I filed an issue to supports string columns in the DataFrame. Working on a patch for it at the moment

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.tt

src/Microsoft.Data/DataFrame.cs

src/Microsoft.Data/Microsoft.Data.csproj

src/Microsoft.Data/Dependencies/netcoreapp2.1/Apache.Arrow.deps.json

eerhardt · 2019-04-09T14:33:17Z

corefxlab.sln

 EndProject
 Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "System.Text.JsonLab.Serialization.Tests", "tests\System.Text.JsonLab.Serialization.Tests\System.Text.JsonLab.Serialization.Tests.csproj", "{544D4C8C-B5C6-4C3C-9763-E4CB6AF9A90C}"
 EndProject
+Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.Data", "src\Microsoft.Data\Microsoft.Data.csproj", "{9542C4FF-BC8F-47E3-916D-D2FF38003823}"


We should come up with a better package/assembly name than Microsoft.Data. That namespace is going to have many things in it, and this work shouldn't try taking the whole thing.

Maybe Microsoft.DataFrames?

is there going to be more than one DataFrame? What about System.Data,Frame?

I don't think there will be more than one DataFrame (at least not owned/maintained/worked on by the corefx team at Microsoft).

We could consider putting this in System, but I don't think the type would ever be in the .NET Core shared framework or in the netstandard specification. I imagine this type to always be available through a NuGet package where we can add more APIs without having to rely on a new .NET Core version/TFM. With that in mind, it is unclear to me what should go into a System namespace and what shouldn't.

DataFrame is yet another abstraction over data in general and I think putting it in System is better than putting it in Microsoft. namespace, when other data-related abstractions are in System.Data, etc. It is not windows-specific, and should not be seen as such to encourage wider adoption.

src/Microsoft.Data/Microsoft.Data.csproj

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.cs

eerhardt · 2019-04-09T14:40:39Z

src/Microsoft.Data/DataFrameColumn.cs

+namespace Microsoft.Data
+{
+    public class DataFrameColumn<T> : BaseDataFrameColumn
+        where T : struct


Maybe this should be PrimitiveDataFrameColumn<T> if T must be a struct. With the current name, it feels like DataFrameColumn<string> should be possible, but isn't.

Unless, of course, we remove the T : struct restriction, then I think the current name would make sense.

In reply to: 273526844 [](ancestors = 273526844)

No, I think the T : struct restriction is required. If there was a restriction that truly only allowed primitive types, I would've used that. The DataFrame can only hold primitive types. No ref types and ideally no user defined structs/value types

What is a reason of that restriction? string isn't a primitive type.

This has been renamed to PrimitiveColumn, so the T : struct remains. Strings will be supported with a new column type

eerhardt · 2019-04-09T14:42:42Z

src/Microsoft.Data/DataFrameBuffer.cs

+    public class DataFrameBuffer<T>
+        where T : struct
+    {
+        public Memory<byte> Memory { get; private set; }


Why isn't the backing store for this a Memory<T>?

Because I was going for similarity with Arrow. This way, the Memory can just be wrapped in an Arrow Buffer with no conversion overhead

Is this going to be compatible with Arrow? I've seen the dependency commented out, but I'm not sure whether Arrow is going to be supported as the engine behind the interface.

Yes, this is going to be compatible with Arrow but, if I understand you right, not exactly in the way you mention. The C# Arrow implementation as it exists now requires that the buffer storing the data is immutable. No APIs exist to mutate the contents of an ArrowBuffer. I had a discussion in the Arrow repo a month or so back about removing this restriction, but a lot of stuff has already been built with immutability in mind, so I think it's safe to say that that is a no-go.

By compatible here I mean that a DataFrameBuffer can be wrapped in an ArrowBuffer easily with no overhead. That way the DataFrameBuffer can be mutated by the DataFrame, and anyone who consumes an ArrowBuffer can also get to the data.

The issue is below:

public Span<T> Span { [MethodImpl(MethodImplOptions.AggressiveInlining)] get => MemoryMarshal.Cast<byte, T>(Memory.Span); }

This isn't a free operation. If someone calls this many times in a hot path, this Cast is going to take cycles, when if it was just in terms of T to begin with, there is no need for a cast.

Similarly, you wouldn't need to hold onto _size and do division for things like Capacity and MaxCapacity.

I created an issue for this

mungojam · 2019-04-09T19:18:22Z

tests/Microsoft.Data.DataFrame.Tests/DataFrameTableTests.cs

+    {
+        static public DataFrame MakeTestTableWithTwoColumns(int length)
+        {
+            BaseDataFrameColumn dataFrameColumn1 = new PrimitiveDataFrameColumn<int>("Int1", Enumerable.Range(0, length).Select(x => x));


PrimitiveDataFrameColumn feels very verbose for such a common thing. It would be good to have some shortcuts to create them, like some static functions that could at least infer the generic type and maybe just be called Column()

ahsonkhan · 2019-04-09T21:53:16Z

src/Microsoft.Data/DataFrameBinaryOperations.cs

@@ -0,0 +1,555 @@
+


nit: remove empty lines

src/Microsoft.Data/DataFrameBinaryOperations.cs

src/Microsoft.Data/DataFrameBuffer.cs

src/Microsoft.Data/DataFrameColumnContainer.cs

src/Microsoft.Data/DataFrameColumnContainerBinaryOperations.cs

src/Microsoft.Data/DataFrameColumnContainer.cs

gfoidl

If it's too early for some (perf-) comments, please ignore them.

src/Microsoft.Data/DataFrameBuffer.cs

gfoidl · 2019-04-10T06:27:42Z

src/Microsoft.Data/DataFrameBuffer.cs

+            Memory = new byte[numberOfValues * _size];
+            Length = 0;
+        }
+        public void Append(T value)


This code-path could be optimized. Similar to what was done in List.Add. Or is it too early for such kind of things?

Note: As part of perf work

gfoidl · 2019-04-10T06:29:23Z

src/Microsoft.Data/DataFrameBuffer.cs

+        {
+            get
+            {
+                if (index > Length) throw new ArgumentOutOfRangeException(nameof(index));


Use the (ugly) ThrowHelper-pattern, to move the throwing code out of the hot-path, thus allowing the indexer to be inlineable.

Other places as well.

Note you only do this when it is definitely hot path .. we are all assuming the JIT will do this one day

Note: As part of perf work

src/Microsoft.Data/DataFrameColumnArithmetic.cs

src/Microsoft.Data/DataFrameColumnContainer.cs

src/Microsoft.Data/DataFrameTable.cs

tests/Microsoft.Data.DataFrame.Tests/Microsoft.DataFrames.Tests.csproj

eerhardt · 2019-04-12T18:25:55Z

tests/Microsoft.Data.DataFrame.Tests/DataFrameTableTests.cs

@@ -0,0 +1,126 @@
+// Licensed to the .NET Foundation under one or more agreements.


We may need a little more tests in this PR. This doesn't seem like enough given the functionality in the library with this PR.

I sort of agree. With this patch, the only ops allowed are things like floatColumn + float, so the scope is pretty narrow. Would you consider waiting till the next patch(will be up in an hour) where I've added a couple more tests for binary ops? That patch tests different functions and more combinations such as int + bool/double/decimal

This is OK for now. Let's be sure to add adequate tests in future PRs.

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs

eerhardt · 2019-04-15T15:30:15Z

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs

+        {
+            get
+            {
+                var ret = new List<T>();


You should initialize this list to the length capacity. That way it doesn't have to grow while you are adding elements to it.

eerhardt · 2019-04-15T15:31:39Z

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs

+                {
+                    long nextRowIndex = startIndex + ret.Count + 1;
+                    arrayIndex++;
+                    temp = Buffers[arrayIndex][(int)nextRowIndex, length - ret.Count, ret];


Even though this is internal, this API usage/pattern is a little unusual:

[(int)nextRowIndex, length - ret.Count, ret];

Specifically, passing in an object ret to get filled up. It feels like it should just be a standard method call.

Right. I don't like that I have to do this either. Basically, this is a slice of length "length" that can begin in the middle of some buffer and spill over into the next buffer. I was trying to get by with passing just 1 return list "ret" to both calls to avoid creating more objects. That's why the return type of the indexer is a bool. Ideally, if there was API that made a List from a Span efficiently, or that allowed me to append to a List from a Span, I would've used that. Unless there's a less ugly way to do this? Definitely open to suggestions

I think we can just make this a method and not use an indexer. Honestly, I don't think I've ever seen someone use a C# indexer this way.

eerhardt · 2019-04-15T15:33:02Z

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs

+                {
+                    long nextRowIndex = startIndex + ret.Count + 1;
+                    arrayIndex++;
+                    temp = Buffers[arrayIndex][(int)nextRowIndex, length - ret.Count, ret];


Is casting nextRowIndex to int, the right thing here? Same for above where we are casting startIndex to an int. Why take in a long if you are casting it to int?

nextRowIndex should've been 0. That's a bug. Fixing it.

I'm taking a long because the df can hold long number of elements. Each buffer can hold only intMax number of elements though, so I'm casting once I get to the right buffer

public IList<T> this[long startIndex, int length] { get { var ret = new List<T>(length); long endIndex = startIndex + length; int arrayIndex = GetArrayContainingRowIndex(ref startIndex); bool temp = Buffers[arrayIndex][(int)startIndex, length, ret];

Here we take a long startIndex and then are casting it (int)startIndex.

I see that GetArrayContainingRowIndex is decreasing the number, which honestly is not intuitive. Maybe make GetArrayContainingRowIndex return an out int newStartIndex (or a better name). Have GetArrayContainingRowIndex ensure that the number is less than int.MaxValue, then the 3 calling places don't need to cast here. If it ever fails, I'm not sure what would even happen with the current code.

src/Microsoft.Data/DataFrameColumnContainerBinaryOperations.cs

eerhardt · 2019-04-15T15:39:34Z

src/Microsoft.Data/DataFrameTable.cs

+
+namespace Microsoft.Data
+{
+    public class DataFrameTable


What's the difference between a DataFrame and a DataFrameTable?

Maybe it would be good to have high-level class summary comments describing each public class. You can write them like this:

/// <summary> /// High level summary comment /// </summary> public class Foo

A DataFrameTable is just a convenient store to hold the columns. The DataFrame class uses it to manipulate the columns and implement its algorithms. I changed the access modifier to internal now so users can only create a DataFrame and don't know anything about the Table

src/Microsoft.Data/Microsoft.DataFrames.csproj

eerhardt · 2019-04-15T15:41:38Z

src/Microsoft.Data/Microsoft.DataFrames.csproj

+  </ItemGroup>-->
+
+  <ItemGroup>
+    <PackageReference Include="CsvHelper" Version="12.1.2" />


Are we actually using CsvHelper? If not, can you delete this line?

Good catch. Deleted it. I put this in a long time ago to write unit tests

tests/Microsoft.Data.DataFrame.Tests/Microsoft.DataFrames.Tests.csproj

src/Microsoft.Data/PrimitiveDataFrameColumn.cs

eerhardt · 2019-04-15T15:47:12Z

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs

+
+namespace Microsoft.Data
+{
+    public partial class PrimitiveDataFrameColumnContainer<T>


From a public (or a user's) perspective, what is the difference between PrimitiveDataFrameColumn and PrimitiveDataFrameColumnContainer?

A PrimitiveDataFrameColumnContainer is just an internal store(changed the modifier to internal now so users can't create one) to hold the column data. PrimitiveDataFrameColumn defines the APIs that manipulate the container

src/Microsoft.Data/DataFrameColumnArithmeticTemplate.ttinclude

pgovind · 2019-04-15T18:09:19Z

Just a note: I'm hoping to get this approved soon so I can put up the patches I have for StringColumn support and BinaryOperator overloads

I'll be filing issues for the various TODOs and other feedback from here shortly so that I can fix them in patches separately

danmoseley · 2019-04-17T21:50:54Z

src/Microsoft.Data/DataFrameBinaryOperations.cs

+            for (int ii = 0; ii < ColumnCount; ii++)
+            {
+                PrimitiveDataFrameColumn<T> column = _table.Column(ii) as PrimitiveDataFrameColumn<T>;
+                if (column != null)


is it correct that it should jsut return silently if this is null (if it's not the expected type)?

Yup, this is wrong. I can throw an error in an else block for this patch but my next patch already handles it by implementing support for things like (intColumn + float). It's hard to support binary ops on generic columns at the moment.

src/Microsoft.Data/DataFrameBuffer.cs

danmoseley · 2019-04-17T21:55:58Z

src/Microsoft.Data/DataFrameBuffer.cs

+
+        public Span<T> Span
+        {
+            [MethodImpl(MethodImplOptions.AggressiveInlining)]


Generally we try to avoid inlining attributes unless we know for sure we need them. ie., we know it will not get inlined (for some bad reason) and we know inlining is important to get the performance we want. In this case I would trust it will get inlined, and when we do perf work later, if it's not, we can do appropriate things (which might include modifying the jit)

Got it. This will go away soon enough though for #2659

src/Microsoft.Data/DataFrameBuffer.cs

danmoseley · 2019-04-17T21:59:46Z

src/Microsoft.Data/DataFrameColumnArithmeticTemplate.cs

@@ -0,0 +1 @@
+


Does templating require the empty file be in the repo for some reason?

Removed. I had to rebase on upstream/master and it got generated this time.

src/Microsoft.Data/DataFrameColumnArithmeticTemplate.ttinclude

src/Microsoft.Data/DataFrameTable.cs

danmoseley · 2019-04-17T22:07:26Z

src/Microsoft.Data/DataFrameTable.cs

+        }
+        public void RemoveColumn(int columnIndex)
+        {
+            _columnNameToIndexDictionary.Remove(_columnNames[columnIndex]);


Should you validate for -1

Nope I think. _columnName[columnIndex] will throw the exception I want

It's a bad practice not to check parameters and delegate their validation.

danmoseley · 2019-04-17T22:10:23Z

src/Microsoft.Data/PrimitiveDataFrameColumnArithmetic.cs

@@ -0,0 +1,4138 @@
+


Nit, blank lines

I was able to remove 1 of the blank lines in my next patch, but for the life of me, I am not able to find out where the first blank line is coming from. All I know is that it is from some random space in the .ttinclude file

danmoseley · 2019-04-17T22:10:47Z

src/Microsoft.Data/PrimitiveDataFrameColumnArithmetic.cs

+    internal interface IPrimitiveDataFrameColumnArithmetic<T>
+        where T : struct
+    {
+       void Add(PrimitiveDataFrameColumnContainer<T> left, PrimitiveDataFrameColumnContainer<T> right);


Consider sorting. It's at least a stable ordering..

Next patch. This'll cause too much pain when I rebase

danmoseley · 2019-04-17T22:11:20Z

src/Microsoft.Data/PrimitiveDataFrameColumnArithmetic.cs

+        public static IPrimitiveDataFrameColumnArithmetic<T> GetArithmetic<T>()
+            where T : struct
+        {
+            if (typeof(T) == typeof(bool))


why not use a switch statement on the type?

you could also set a return value then return return (IPrimitiveDataFrameColumnArithmetic<T>)value which will save all these pasted casts

Oh, I just took this from some of the Tensor code we have checked in, so I didn't pay too much attention to it. Will consider it for my next patch

How would you specify the case where a constant value is required, i.e.

switch (typeof(T)) { case typeof(int): // ...

Won't compile (CS0150) (with C# 7.3 and any version SharpLab offers).
Or does pattern matching provide a syntax I'm not aware of?

A way via TypeCode won't be optimized by JIT -- I believe JIT won't optimize (dead code eliminate) on case either, as it does with the if-chain.

I'm not an expert on this, but my understanding of this pattern was that the JIT was smart enough to eliminate chunks of code. When JIT'ing for T == bool, it can see that if (typeof(T) == typeof(bool)) always holds, similarly else if (typeof(T) == typeof(int)) never holds, so it can completely remove non-bool code paths.

@eerhardt correct.

The JIT does this by dead code elimination, as typeof is a JIT-time-constant, so all other branches will fall away.

Side note: As the IL for such methods is quite big, the JIT won't inline these methods in most cases. Here AggressiveInline can help.

Interesting.

danmoseley · 2019-04-17T22:13:24Z

src/Microsoft.Data/PrimitiveDataFrameColumnArithmetic.cs

+
+    internal class BoolArithmetic : IPrimitiveDataFrameColumnArithmetic<bool>
+    {
+        public void Add(PrimitiveDataFrameColumnContainer<bool> left, PrimitiveDataFrameColumnContainer<bool> right)


VS has a magic refactoring that will implement an interface for you. When it does this it uses NotImplementedEXception. You should probably use that.

NotImplementedException --> I am not done here
NotSupportedException --> It is not meaningful to call this (usually when something really should be abstract)
PlatformNotSupportedException --> Works on other platforms, not this one

ie in theory in CoreFX there are zero NotImplementedException, but in practice I see we weren't consistent

NotSupportedException is the right thing here I think. It doesn't make sense to call any arithmetic ops on a bool

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.BinaryOperations.cs

danmoseley

Seems fine to me but someone who understands the semantics yo'ure shooting for here should signo ff.

src/Microsoft.Data/BaseDataFrameColumn.cs

YohDeadfall · 2019-04-18T20:43:13Z

src/Microsoft.Data/BaseDataFrameColumn.cs

+
+        public long NullCount { get; protected set; }
+
+        public string Name;


Public fields is a bad practice. Must be replaced by a property.

src/Microsoft.Data/BaseDataFrameColumn.cs

YohDeadfall · 2019-04-18T20:44:41Z

src/Microsoft.Data/BaseDataFrameColumn.cs

+        public string Name;
+
+        public virtual object this[long rowIndex] { get { throw new NotImplementedException(); } set { throw new NotImplementedException(); } }
+        public virtual object this[long startIndex, int length] { get { throw new NotImplementedException(); } }


Okay, let's change it a bit later.

src/Microsoft.Data/DataFrame.cs

YohDeadfall · 2019-04-18T20:55:55Z

src/Microsoft.Data/DataFrameTable.cs

+        private DataFrameTable(IList<BaseDataFrameColumn> columns)
+        {
+            columns = columns ?? throw new ArgumentNullException(nameof(columns));
+            _columns = columns;


Why are you assigning the value to columns if it's a parameter you're checking?

I don't understand?? I'm assigning to _columns? columns without the "_" is the parameter. Technically I'm not 100% convinced with this code yet. I'm still not certain if I should be making a copy of the columns being passed in, so I'll wait for more usage scenarios before I make the choice

You're assigning columns to columns in line 32 to check for a null value. Just check the value while assigning it to the field:

Suggested change

_columns = columns;

_columns = columns ?? throw new ArgumentNullException(nameof(columns));

src/Microsoft.Data/DataFrameTable.cs

YohDeadfall · 2019-04-18T21:00:41Z

src/Microsoft.Data/DataFrameTable.cs

+        }
+        public void RemoveColumn(int columnIndex)
+        {
+            _columnNameToIndexDictionary.Remove(_columnNames[columnIndex]);


It's a bad practice not to check parameters and delegate their validation.

src/Microsoft.Data/DataFrameTable.cs

eerhardt · 2019-04-19T01:51:38Z

src/Microsoft.Data/Microsoft.Data.DataFrame.csproj

+<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <TargetFramework>netcoreapp2.2</TargetFramework>


Can this not be netstandard2.0?

eerhardt · 2019-04-19T01:52:58Z

src/Microsoft.Data/Microsoft.Data.DataFrame.csproj

+  </ItemGroup>
+
+  <ItemGroup>
+    <Compile Update="BaseColumn.BinaryOperations.cs">


I don't see this file in your PR. Are you sure all of these are still necessary?

eerhardt · 2019-04-19T01:54:14Z

src/Microsoft.Data/BaseDataFrameColumn.cs

+namespace Microsoft.Data
+{
+    /// <summary>
+    /// The base column type. All APIs should have atleast a stub here first


atleast is two words.

eerhardt

Let's be sure any feedback you didn't address here gets tracked and addressed in a future PR (there are a lot of comments to wade through).

YohDeadfall reviewed Apr 9, 2019

View reviewed changes

eerhardt reviewed Apr 9, 2019

View reviewed changes

src/Microsoft.Data/Microsoft.Data.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 9, 2019

View reviewed changes

src/Microsoft.Data/Dependencies/netcoreapp2.1/Apache.Arrow.deps.json Outdated Show resolved Hide resolved

eerhardt reviewed Apr 9, 2019

View reviewed changes

src/Microsoft.Data/Microsoft.Data.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 9, 2019

View reviewed changes

src/Microsoft.Data/BinaryOperations/DataFrameBinaryOperations.cs Show resolved Hide resolved

eerhardt reviewed Apr 9, 2019

View reviewed changes

mungojam reviewed Apr 9, 2019

View reviewed changes

ahsonkhan reviewed Apr 9, 2019

View reviewed changes

gfoidl reviewed Apr 10, 2019

View reviewed changes

eerhardt reviewed Apr 10, 2019

View reviewed changes

tests/Microsoft.Data.DataFrame.Tests/Microsoft.DataFrames.Tests.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 12, 2019

View reviewed changes

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.cs Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/DataFrameColumnContainerBinaryOperations.cs Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/Microsoft.DataFrames.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/Microsoft.DataFrames.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/Microsoft.DataFrames.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

tests/Microsoft.Data.DataFrame.Tests/Microsoft.DataFrames.Tests.csproj Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/PrimitiveDataFrameColumn.cs Outdated Show resolved Hide resolved

eerhardt reviewed Apr 15, 2019

View reviewed changes

src/Microsoft.Data/DataFrameColumnArithmeticTemplate.ttinclude Outdated Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameBuffer.cs Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameBuffer.cs Outdated Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameBuffer.cs Outdated Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameColumnArithmeticTemplate.ttinclude Outdated Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameTable.cs Outdated Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/DataFrameTable.cs Show resolved Hide resolved

danmoseley reviewed Apr 17, 2019

View reviewed changes

src/Microsoft.Data/PrimitiveDataFrameColumnContainer.BinaryOperations.cs Show resolved Hide resolved

danmoseley approved these changes Apr 17, 2019

View reviewed changes

Prashanth Govindarajan added 4 commits April 17, 2019 16:23

Address Dan's feedback

004fb10

Address perf feedback

76d28b9

Namespace change

2963c89

sq

3d358b6

YohDeadfall reviewed Apr 18, 2019

View reviewed changes

More feedback

1c16625

eerhardt reviewed Apr 19, 2019

View reviewed changes

eerhardt approved these changes Apr 19, 2019

View reviewed changes

sq

00ee04d

ericstj merged commit 258a9db into dotnet:master Apr 19, 2019

eerhardt mentioned this pull request May 9, 2019

Question : Exploration through Fsharp + DataFrame types dotnet/machinelearning#3692

Closed

pgovind mentioned this pull request Mar 6, 2021

Optimize binary operations on DataFrame dotnet/machinelearning#5695

Open

		@@ -0,0 +1,126 @@
		// Licensed to the .NET Foundation under one or more agreements.


		public long NullCount { get; protected set; }

		public string Name;

	_columns = columns;
	_columns = columns ?? throw new ArgumentNullException(nameof(columns));

		@@ -0,0 +1 @@
		No newline at end of file

Data frame binary operations #2656

Data frame binary operations #2656

Uh oh!

Conversation

pgovind commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfclas commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YohDeadfall commented Apr 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgovind Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YohDeadfall Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

pgovind commented Apr 9, 2019 •

edited

Loading

dnfclas commented Apr 9, 2019 •

edited

Loading

pgovind Apr 9, 2019 •

edited

Loading

YohDeadfall Apr 9, 2019 •

edited

Loading

danmoseley Apr 17, 2019 •

edited

Loading