Skip to content

\WP_CLI\Iterators\CSV breaks with multi-line CSV values #5412

@emrikol

Description

@emrikol

Bug Report

Describe the current, buggy behavior

When using the WP-CLI built-in CSV tools, WP-CLI can output a CSV file that it then can't properly read back in.

Using \WP_CLI\Utils\write_csv() to write CSV files with multi-line values, and then \WP_CLI\Iterators\CSV() to re-read them back in, \WP_CLI\Iterators\CSV() fails to properly read the values especially if they contain commas inside strings.

I ran into this issue while writing a command to export some custom data, which included post meta that had multi-line values.

Describe how other contributors can replicate this bug

You can create a test CSV file using this bit of code:

$handle = fopen( 'test.csv', 'wb' );
WP_CLI\Utils\write_csv( $handle, array(
	array(
		'Line 1 test, comma in a single line',
		'Line 1 test with no comma at all',
		"Line 1 test, with a comma over\nmultiple lines\nin many places",
		'Line 1 test with no comma at all',
	),
	array(
		'Line 2 test, comma in a single line',
		'Line 2 test with no comma at all',
		"Line 2 test, with a comma over\nmultiple lines\nin many places",
		'Line 2 test with no comma at all',
	),
	array(
		'Line 3 test, comma in a single line',
		'Line 3 test with no comma at all',
		"Line 3 test, with a comma over\nmultiple lines\nin many places",
		'Line 3 test with no comma at all',
	),
	array(
		'Line 4 test, comma in a single line',
		'Line 4 test with no comma at all',
		"Line 4 test, with a comma over\nmultiple lines\nin many places",
		'Line 4 test with no comma at all',
	),
) );
fclose( $handle );

which creates this test.csv file:

"Line 1 test, comma in a single line","Line 1 test with no comma at all","Line 1 test, with a comma over
multiple lines
in many places","Line 1 test with no comma at all"
"Line 2 test, comma in a single line","Line 2 test with no comma at all","Line 2 test, with a comma over
multiple lines
in many places","Line 2 test with no comma at all"
"Line 3 test, comma in a single line","Line 3 test with no comma at all","Line 3 test, with a comma over
multiple lines
in many places","Line 3 test with no comma at all"
"Line 4 test, comma in a single line","Line 4 test with no comma at all","Line 4 test, with a comma over
multiple lines
in many places","Line 4 test with no comma at all"

When you re-read the same test.csv back, it's broken:

foreach ( new \WP_CLI\Iterators\CSV( 'test.csv' ) as $test ) {
	var_dump( $test );
}

which outputs:

test.php:
array(3) {
  'Line 1 test, comma in a single line' =>
  string(35) "Line 2 test, comma in a single line"
  'Line 1 test with no comma at all' =>
  string(32) "Line 2 test with no comma at all"
  'Line 1 test, with a comma over
multiple lines
in many places' =>
  string(31) "Line 2 test, with a comma over\n"
}
test.php:
array(1) {
  'Line 1 test, comma in a single line' =>
  string(14) "multiple lines"
}
test.php:
array(2) {
  'Line 1 test, comma in a single line' =>
  string(15) "in many places""
  'Line 1 test with no comma at all' =>
  string(32) "Line 2 test with no comma at all"
}
test.php:
array(3) {
  'Line 1 test, comma in a single line' =>
  string(35) "Line 3 test, comma in a single line"
  'Line 1 test with no comma at all' =>
  string(32) "Line 3 test with no comma at all"
  'Line 1 test, with a comma over
multiple lines
in many places' =>
  string(31) "Line 3 test, with a comma over\n"
}
test.php:
array(1) {
  'Line 1 test, comma in a single line' =>
  string(14) "multiple lines"
}
test.php:
array(2) {
  'Line 1 test, comma in a single line' =>
  string(15) "in many places""
  'Line 1 test with no comma at all' =>
  string(32) "Line 3 test with no comma at all"
}
test.php:
array(3) {
  'Line 1 test, comma in a single line' =>
  string(35) "Line 4 test, comma in a single line"
  'Line 1 test with no comma at all' =>
  string(32) "Line 4 test with no comma at all"
  'Line 1 test, with a comma over
multiple lines
in many places' =>
  string(31) "Line 4 test, with a comma over\n"
}
test.php:
array(1) {
  'Line 1 test, comma in a single line' =>
  string(14) "multiple lines"
}

You can see that instead of four arrays with four items each, you get a larger number of broken arrays.

Describe what you expect as the correct outcome

Using a different core PHP function, fgetcsv(), to read the CSV file we can see what I expected the data to look like:

$handle = fopen( 'test.csv', 'rb' );
while ( false !== ( $data = fgetcsv( $handle ) ) ) {
	var_dump( $data );
}
fclose( $handle );

which outputs

test.php:
array(4) {
  [0] =>
  string(35) "Line 1 test, comma in a single line"
  [1] =>
  string(32) "Line 1 test with no comma at all"
  [2] =>
  string(60) "Line 1 test, with a comma over\nmultiple lines\nin many places"
  [3] =>
  string(32) "Line 1 test with no comma at all"
}
test.php:
array(4) {
  [0] =>
  string(35) "Line 2 test, comma in a single line"
  [1] =>
  string(32) "Line 2 test with no comma at all"
  [2] =>
  string(60) "Line 2 test, with a comma over\nmultiple lines\nin many places"
  [3] =>
  string(32) "Line 2 test with no comma at all"
}
test.php:
array(4) {
  [0] =>
  string(35) "Line 3 test, comma in a single line"
  [1] =>
  string(32) "Line 3 test with no comma at all"
  [2] =>
  string(60) "Line 3 test, with a comma over\nmultiple lines\nin many places"
  [3] =>
  string(32) "Line 3 test with no comma at all"
}
test.php:
array(4) {
  [0] =>
  string(35) "Line 4 test, comma in a single line"
  [1] =>
  string(32) "Line 4 test with no comma at all"
  [2] =>
  string(60) "Line 4 test, with a comma over\nmultiple lines\nin many places"
  [3] =>
  string(32) "Line 4 test with no comma at all"
}

Let us know what environment you are running this on

OS:	Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2 (2020-04-29) x86_64
Shell:	/bin/bash
PHP binary:	/usr/local/php7.3/bin/php
PHP version:	7.3.19
php.ini used:	/usr/local/php7.3/conf/php.ini
WP-CLI root dir:	phar://wp-cli.phar/vendor/wp-cli/wp-cli
WP-CLI vendor dir:	phar://wp-cli.phar/vendor
WP_CLI phar path:	/redacted/but/generic/directory
WP-CLI packages dir:
WP-CLI global config:
WP-CLI project config:
WP-CLI version:	2.4.0

Provide a possible solution

I don't have a specific solution, but I can point to where the issue is happening at:

Inside \WP_CLI\Iterators\CSV() we're using fgets() which, if I'm understanding things correctly, reads one line at a time.

Since the CSV has multi-line values, it's reading each line as a new value rather than a continuation of an existing value.

Due to the way that \WP_CLI\Iterators\CSV() works with rewind() and next() I don't think it's as simple a fix as dropping in fgetcsv().

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions