GH-37770: [MATLAB] Add CSV TableReader and TableWriter MATLAB classes#37773
GH-37770: [MATLAB] Add CSV TableReader and TableWriter MATLAB classes#37773kevingurney merged 22 commits intoapache:mainfrom
TableReader and TableWriter MATLAB classes#37773Conversation
|
|
928f9c9 to
bedaf98
Compare
sgilmore10
left a comment
There was a problem hiding this comment.
This looks good! Thanks for adding the base test class and test utilities. We can re-use these in the future for other file types!
|
It looks like the I just pushed a commit that enables this. |
|
Enabling building of the We could try to fix these build failures by explicitly installing I have already confirmed that not building the Arrow C++ library tests when I'll start working on a separate PR to remove |
|
For reference, I am working on the changes required to remove |
|
For reference - I've opened #37784 for removing |
…ke build system for the MATLAB interface (#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in #37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock #37773 as discussed in #37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in #37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: #37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
a6055fb to
5149465
Compare
|
Update: the code to remove Now that the Arrow C++ tests are no longer being built, enabling At this point, these changes should be ready to merge. |
|
The failures for the |
|
CI failures are related to #37803. |
|
+1 |
2. Change `write` method to take in an `arrow.tabular.Table`, rather than an `arrow.tabular.RecordBatch`. 3. Add basic implementation of `arrow.io.csv.TableReader` class. Co-authored-by: Sarah Gilmore <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
2. Create CSV test superclass. 3. Add tError test class. Co-authored-by: Sarah Gilmore <[email protected]>
…iter filename argument. 2. Add basic error tests for TableReader and TableWriter. Co-authored-by: Sarah Gilmore <[email protected]>
2. Add more error tests. Co-authored-by: Sarah Gilmore <[email protected]>
…rkflow script. Co-authored-by: Sarah Gilmore <[email protected]>
cdaa51a to
aea2f39
Compare
|
#37808 has been merged. I just rebased these changes on top of |
|
The MATLAB CI workflows are passing now. I will merge this PR. |
|
+1 |
|
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2b34e37. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…he CMake build system for the MATLAB interface (apache#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in apache#37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](apache#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock apache#37773 as discussed in apache#37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in apache#37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…AB classes (apache#37773) ### Rationale for this change To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface. ### What changes are included in this PR? 1. Added a new `arrow.io.csv.TableReader` class 2. Added a new `arrow.io.csv.TableWriter` class **Example** ```matlab >> matlabTableWrite = array2table(rand(3)) matlabTableWrite = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> arrowTableWrite = arrow.table(matlabTableWrite) arrowTableWrite = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> writer = arrow.io.csv.TableWriter("example.csv") writer = TableWriter with properties: Filename: "example.csv" >> writer.write(arrowTableWrite) >> reader = arrow.io.csv.TableReader("example.csv") reader = TableReader with properties: Filename: "example.csv" >> arrowTableRead = reader.read() arrowTableRead = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> matlabTableRead = table(arrowTableRead) matlabTableRead = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> isequal(arrowTableRead, arrowTableWrite) ans = logical 1 >> isequal(matlabTableRead, matlabTableWrite) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`. 2. Both of these test classes inherit from a `CSVTest` superclass. ### Are there any user-facing changes? Yes. 1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`. ### Future Directions 1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB. 2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values. 4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV. 5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! 2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request. * Closes: apache#37770 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…he CMake build system for the MATLAB interface (apache#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in apache#37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](apache#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock apache#37773 as discussed in apache#37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in apache#37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…AB classes (apache#37773) ### Rationale for this change To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface. ### What changes are included in this PR? 1. Added a new `arrow.io.csv.TableReader` class 2. Added a new `arrow.io.csv.TableWriter` class **Example** ```matlab >> matlabTableWrite = array2table(rand(3)) matlabTableWrite = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> arrowTableWrite = arrow.table(matlabTableWrite) arrowTableWrite = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> writer = arrow.io.csv.TableWriter("example.csv") writer = TableWriter with properties: Filename: "example.csv" >> writer.write(arrowTableWrite) >> reader = arrow.io.csv.TableReader("example.csv") reader = TableReader with properties: Filename: "example.csv" >> arrowTableRead = reader.read() arrowTableRead = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> matlabTableRead = table(arrowTableRead) matlabTableRead = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> isequal(arrowTableRead, arrowTableWrite) ans = logical 1 >> isequal(matlabTableRead, matlabTableWrite) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`. 2. Both of these test classes inherit from a `CSVTest` superclass. ### Are there any user-facing changes? Yes. 1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`. ### Future Directions 1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB. 2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values. 4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV. 5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! 2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request. * Closes: apache#37770 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
Rationale for this change
To enable initial CSV I/O support, this PR adds
arrow.io.csv.TableReaderandarrow.io.csv.TableWriterMATLAB classes to the MATLAB interface.What changes are included in this PR?
arrow.io.csv.TableReaderclassarrow.io.csv.TableWriterclassExample
Are these changes tested?
Yes.
test/arrow/io/csv/tRoundTrip.mandtest/arrow/io/csv/tError.m.CSVTestsuperclass.Are there any user-facing changes?
Yes.
arrow.io.csv.TableReaderandarrow.io.csv.TableWriter.Future Directions
RecordBatchReaderandRecordBatchWriterfor CSV.Notes
TableReaderandTableWriterin one pull request because it simplified testing. My apologies for the slightly lengthy pull request.TableReaderandTableWriterMATLAB classes #37770