-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Bug Report
- Yes, I reviewed the contribution guidelines.
- Yes, more specifically, I reviewed the guidelines on how to write clear bug reports.
Describe the current, buggy behavior
When using the WP-CLI built-in CSV tools, WP-CLI can output a CSV file that it then can't properly read back in.
Using \WP_CLI\Utils\write_csv() to write CSV files with multi-line values, and then \WP_CLI\Iterators\CSV() to re-read them back in, \WP_CLI\Iterators\CSV() fails to properly read the values especially if they contain commas inside strings.
I ran into this issue while writing a command to export some custom data, which included post meta that had multi-line values.
Describe how other contributors can replicate this bug
You can create a test CSV file using this bit of code:
$handle = fopen( 'test.csv', 'wb' );
WP_CLI\Utils\write_csv( $handle, array(
array(
'Line 1 test, comma in a single line',
'Line 1 test with no comma at all',
"Line 1 test, with a comma over\nmultiple lines\nin many places",
'Line 1 test with no comma at all',
),
array(
'Line 2 test, comma in a single line',
'Line 2 test with no comma at all',
"Line 2 test, with a comma over\nmultiple lines\nin many places",
'Line 2 test with no comma at all',
),
array(
'Line 3 test, comma in a single line',
'Line 3 test with no comma at all',
"Line 3 test, with a comma over\nmultiple lines\nin many places",
'Line 3 test with no comma at all',
),
array(
'Line 4 test, comma in a single line',
'Line 4 test with no comma at all',
"Line 4 test, with a comma over\nmultiple lines\nin many places",
'Line 4 test with no comma at all',
),
) );
fclose( $handle );which creates this test.csv file:
"Line 1 test, comma in a single line","Line 1 test with no comma at all","Line 1 test, with a comma over
multiple lines
in many places","Line 1 test with no comma at all"
"Line 2 test, comma in a single line","Line 2 test with no comma at all","Line 2 test, with a comma over
multiple lines
in many places","Line 2 test with no comma at all"
"Line 3 test, comma in a single line","Line 3 test with no comma at all","Line 3 test, with a comma over
multiple lines
in many places","Line 3 test with no comma at all"
"Line 4 test, comma in a single line","Line 4 test with no comma at all","Line 4 test, with a comma over
multiple lines
in many places","Line 4 test with no comma at all"
When you re-read the same test.csv back, it's broken:
foreach ( new \WP_CLI\Iterators\CSV( 'test.csv' ) as $test ) {
var_dump( $test );
}which outputs:
test.php:
array(3) {
'Line 1 test, comma in a single line' =>
string(35) "Line 2 test, comma in a single line"
'Line 1 test with no comma at all' =>
string(32) "Line 2 test with no comma at all"
'Line 1 test, with a comma over
multiple lines
in many places' =>
string(31) "Line 2 test, with a comma over\n"
}
test.php:
array(1) {
'Line 1 test, comma in a single line' =>
string(14) "multiple lines"
}
test.php:
array(2) {
'Line 1 test, comma in a single line' =>
string(15) "in many places""
'Line 1 test with no comma at all' =>
string(32) "Line 2 test with no comma at all"
}
test.php:
array(3) {
'Line 1 test, comma in a single line' =>
string(35) "Line 3 test, comma in a single line"
'Line 1 test with no comma at all' =>
string(32) "Line 3 test with no comma at all"
'Line 1 test, with a comma over
multiple lines
in many places' =>
string(31) "Line 3 test, with a comma over\n"
}
test.php:
array(1) {
'Line 1 test, comma in a single line' =>
string(14) "multiple lines"
}
test.php:
array(2) {
'Line 1 test, comma in a single line' =>
string(15) "in many places""
'Line 1 test with no comma at all' =>
string(32) "Line 3 test with no comma at all"
}
test.php:
array(3) {
'Line 1 test, comma in a single line' =>
string(35) "Line 4 test, comma in a single line"
'Line 1 test with no comma at all' =>
string(32) "Line 4 test with no comma at all"
'Line 1 test, with a comma over
multiple lines
in many places' =>
string(31) "Line 4 test, with a comma over\n"
}
test.php:
array(1) {
'Line 1 test, comma in a single line' =>
string(14) "multiple lines"
}
You can see that instead of four arrays with four items each, you get a larger number of broken arrays.
Describe what you expect as the correct outcome
Using a different core PHP function, fgetcsv(), to read the CSV file we can see what I expected the data to look like:
$handle = fopen( 'test.csv', 'rb' );
while ( false !== ( $data = fgetcsv( $handle ) ) ) {
var_dump( $data );
}
fclose( $handle );which outputs
test.php:
array(4) {
[0] =>
string(35) "Line 1 test, comma in a single line"
[1] =>
string(32) "Line 1 test with no comma at all"
[2] =>
string(60) "Line 1 test, with a comma over\nmultiple lines\nin many places"
[3] =>
string(32) "Line 1 test with no comma at all"
}
test.php:
array(4) {
[0] =>
string(35) "Line 2 test, comma in a single line"
[1] =>
string(32) "Line 2 test with no comma at all"
[2] =>
string(60) "Line 2 test, with a comma over\nmultiple lines\nin many places"
[3] =>
string(32) "Line 2 test with no comma at all"
}
test.php:
array(4) {
[0] =>
string(35) "Line 3 test, comma in a single line"
[1] =>
string(32) "Line 3 test with no comma at all"
[2] =>
string(60) "Line 3 test, with a comma over\nmultiple lines\nin many places"
[3] =>
string(32) "Line 3 test with no comma at all"
}
test.php:
array(4) {
[0] =>
string(35) "Line 4 test, comma in a single line"
[1] =>
string(32) "Line 4 test with no comma at all"
[2] =>
string(60) "Line 4 test, with a comma over\nmultiple lines\nin many places"
[3] =>
string(32) "Line 4 test with no comma at all"
}
Let us know what environment you are running this on
OS: Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2 (2020-04-29) x86_64
Shell: /bin/bash
PHP binary: /usr/local/php7.3/bin/php
PHP version: 7.3.19
php.ini used: /usr/local/php7.3/conf/php.ini
WP-CLI root dir: phar://wp-cli.phar/vendor/wp-cli/wp-cli
WP-CLI vendor dir: phar://wp-cli.phar/vendor
WP_CLI phar path: /redacted/but/generic/directory
WP-CLI packages dir:
WP-CLI global config:
WP-CLI project config:
WP-CLI version: 2.4.0
Provide a possible solution
I don't have a specific solution, but I can point to where the issue is happening at:
Inside \WP_CLI\Iterators\CSV() we're using fgets() which, if I'm understanding things correctly, reads one line at a time.
Since the CSV has multi-line values, it's reading each line as a new value rather than a continuation of an existing value.
Due to the way that \WP_CLI\Iterators\CSV() works with rewind() and next() I don't think it's as simple a fix as dropping in fgetcsv().