v.db.select: Add -e flag (escape newlines and backslashes) and -j flag (JSON output) by HuidaeCho · Pull Request #476 · OSGeo/grass

HuidaeCho · 2020-04-02T00:52:15Z

Issue

If there are newline characters in a column, v.db.select will print multiple lines per record, which breaks v.report (#477). I found this issue in the GADM 3.6 dataset.

How to reproduce it in the NC sample dataset

g.copy --o vect=boundary_county,tmp
db.execute sql="update tmp set name='BREAK' || char(10) || 'ME\nPLEASE' where cat=1"
v.db.select -c map=tmp column=name where=cat=1

will print two lines:

BREAK
ME\nPLEASE

What does this PR do?

v.db.select -ce map=tmp column=name where=cat=1

will print

BREAK\nME\\nPLEASE

Thoughts

Maybe, we should make -n flag default?

HuidaeCho · 2020-04-04T16:00:11Z

BREAK
ME\nPLEASE

vs.

BREAK\nME\\nPLEASE

I think because of escaped backslashes and rare (hopefully) cases of newlines in records, it's better to keep the -n flag as is instead of making this behavior default.

wenzeslaus · 2020-04-05T01:18:00Z

Hard to say, but I'm saying yes to -n flag as you suggested because the current format is closest to CSV which (usually) does not escape newlines. Although once you have the possibility of newlines characters in your data you will most likely need to use -n which is your motivation for all this.

Why this is hard to decide or to do is, I think, caused by a more general issue and that is lack of definition for these output formats (of GRASS modules); similarly to the discussion in #325. Perhaps v.report is for viewing rather than subsequent parsing, then escaping new lines by default would make sense, but escaping backslashes would not. If it is for querying, a commonly used format, even if it is just a well-defined CSV, would be needed for smooth subsequent parsing. That's a decision we have already made for complex output of v.what. See how this nicely plays out in this random example of parsing its output with a standard tools (here JSON reading in Ruby and Python):

$ grass ~/grassdata/nc_spm/user1/ \
      --exec v.what zipcodes_wake coordinates=637502.25,221744.25 -j | \
      ruby -ryaml -rjson -e 'puts YAML.dump(JSON.parse(STDIN.read))'
---
Coordinates:
  East: '637502.25'
  North: '221744.25'
Maps:
- Map: zipcodes_wake
  Mapset: PERMANENT
  Type: Area
  Sq_Meters: 130875884.223
  Hectares: 13087.588
  Acres: 32340.135
  Sq_Miles: 50.5315
  Categories:
  - Layer: 1
    Category: 40
$ # similar result also with
$ ... python3 -c "import json, yaml, sys; print(yaml.dump(json.load(sys.stdin)))"

HuidaeCho · 2020-04-05T06:43:36Z

Hard to say, but I'm saying yes to -n flag as you suggested because the current format is closest to CSV which (usually) does not escape newlines. Although once you have the possibility of newlines characters in your data you will most likely need to use -n which is your motivation for all this.

I agree to keep the default behavior.

Why this is hard to decide or to do is, I think, caused by a more general issue and that is lack of definition for these output formats (of GRASS modules); similarly to the discussion in #325. Perhaps v.report is for viewing rather than subsequent parsing, then escaping new lines by default would make sense, but escaping backslashes would not. If it is for querying, a commonly used format, even if it is just a well-defined CSV, would be needed for smooth subsequent parsing.

Escaping only newlines is problematic as shown in the above example (\ + n vs. \n). v.report could unescape newlines and backslashes after generating the report, but, again, I don't really think that many users would expect multiple lines per record from both v.db.select and v.report. I wouldn't have found this issue myself if I didn't decide to run v.report on the GADM data. This command line unescapes escaped characters.

v.report map=tmp option=area | sed 's/\\\\/\x0/g; s/\\n/\n/g; s/\x0/\\/g'

That's a decision we have already made for complex output of v.what. See how this nicely plays out in this random example of parsing its output with a standard tools (here JSON reading in Ruby and Python):

$ grass ~/grassdata/nc_spm/user1/ \
      --exec v.what zipcodes_wake coordinates=637502.25,221744.25 -j | \
      ruby -ryaml -rjson -e 'puts YAML.dump(JSON.parse(STDIN.read))'
---
Coordinates:
  East: '637502.25'
  North: '221744.25'
Maps:
- Map: zipcodes_wake
  Mapset: PERMANENT
  Type: Area
  Sq_Meters: 130875884.223
  Hectares: 13087.588
  Acres: 32340.135
  Sq_Miles: 50.5315
  Categories:
  - Layer: 1
    Category: 40
$ # similar result also with
$ ... python3 -c "import json, yaml, sys; print(yaml.dump(json.load(sys.stdin)))"

IMO, JSON is less human-readable, but it would be great to implement the JSON format in v.db.select for parsing.

neteler · 2020-04-05T06:56:59Z

Note: JSON support is currently being worked on in OSGeo/grass-addons#137 (in Python, though).

HuidaeCho · 2020-04-05T20:25:36Z

For our record, the pipe character (|) is not as bad as newlines (not breaking v.report):

db.execute sql="update tmp set name='BREAK|ME AGAIN' where cat=1"

ninsbl · 2020-04-05T20:31:30Z

Hi Huidae, while you are at the newline issue, would you mind looking at r.info as well? See:
https://www.mail-archive.com/[email protected]/msg59103.html
If metadata contains linebrakes, output of r.info -e can no longer be parsed...

HuidaeCho · 2020-04-05T21:45:19Z

Note: JSON support is currently being worked on in OSGeo/grass-addons#137 (in Python, though).

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well. Anyway, this PR now includes -n (newline and backslash escaping) and -j (JSON output) flags (mutually exclusive). The -j flag always escapes newlines and backslashes as per JSON specs (https://www.json.org/).

g.copy --o vect=boundary_county,tmp
db.execute sql="update tmp set name='BREAK' || char(10) || '"'"'"ME''\nPLEASE' where cat=1"
v.db.select -j map=tmp |
python3 -c 'import sys, json; recs=json.loads(sys.stdin.read()); print(recs[0]["NAME"])'

outputs

BREAK
"ME'\nPLEASE

HuidaeCho · 2020-04-05T22:03:18Z

Hi Huidae, while you are at the newline issue, would you mind looking at r.info as well? See:
https://www.mail-archive.com/[email protected]/msg59103.html
If metadata contains linebrakes, output of r.info -e can no longer be parsed...

Hi Stefan, How can I reproduce it in the NC sample dataset?

ninsbl · 2020-04-05T22:17:56Z

Hi Stefan, How can I reproduce it in the NC sample dataset?

You could do:

# Add two lines to history
r.support elevation history=BREAK
r.support elevation history=ME

# Print extended info in key-value form (no line breaks)
r.info -e elevation
# Print history (with line breaks)
r.info -h elevation

HuidaeCho · 2020-04-05T22:26:34Z

r.support

comments="r.proj input="ned03arcsec" location="northcarolina_latlong" mapset="\helena" output="elev_ned10m" method="cubic" resolution=10BREAKME"

\ is added in the middle and even double quotes are not properly escaped.

echo $comments
r.proj input=ned03arcsec location=northcarolina_latlong mapset=helena output=elev_ned10m method=cubic resolution=10BREAKME

You want to create a new issue ticket?

wenzeslaus · 2020-04-06T20:42:00Z

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well.

Just create the new branch from master, not from the branch for this PR:

git checkout master
git checkout -b new-branch-name

HuidaeCho · 2020-04-06T21:02:15Z

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well.

Just create the new branch from master, not from the branch for this PR:
git checkout master
git checkout -b new-branch-name

Wouldn't that create conflicts?
PR 1: this
PR 2: that (without this from PR1)

Can we merge both PRs with no conflicts? or PR 1 first, then PR 2?

wenzeslaus

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

vector/v.db.select/main.c

HuidaeCho · 2020-04-07T19:53:50Z

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

wenzeslaus · 2020-04-21T14:26:28Z

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

wenzeslaus

I like what you did with the rest of the code. Thanks.

HuidaeCho · 2020-04-21T14:46:52Z

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

Yes, I thought about that. Just loading the output to validate the JSON format only, but the same test script does depend on the same data set. That's why I decided to check the validity of the data as well. Let me do something about this.

HuidaeCho · 2020-04-21T14:55:33Z

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

Yes, I thought about that. Just loading the output to validate the JSON format only, but the same test script does depend on the same data set. That's why I decided to check the validity of the data as well. Let me do something about this.

Done in f493ac1

* v.report: Fix newline issue * v.db.select -n => -e (#476)

v.db.select: Add -n flag to avoid multiple lines per record

3234e37

HuidaeCho added the enhancement New feature or request label Apr 2, 2020

HuidaeCho mentioned this pull request Apr 2, 2020

v.report: Fix newline issue #477

Merged

HuidaeCho changed the title ~~v.db.select: Add -n flag to avoid multiple lines per record~~ WIP: v.db.select: Add -n flag to avoid multiple lines per record Apr 4, 2020

Escape backslash to avoid confusion

1458ffc

HuidaeCho changed the title ~~WIP: v.db.select: Add -n flag to avoid multiple lines per record~~ v.db.select: Add -n flag to avoid multiple lines per record Apr 4, 2020

v.db.select: Implement the -j flag for JSON output

ef89db5

HuidaeCho changed the title ~~v.db.select: Add -n flag to avoid multiple lines per record~~ v.db.select: Add -n flag to avoid multiple lines per record and -j flag to output JSON Apr 5, 2020

HuidaeCho changed the title ~~v.db.select: Add -n flag to avoid multiple lines per record and -j flag to output JSON~~ v.db.select: Add -n flag (escape newlines) and -j flag (JSON output) Apr 5, 2020

Escape double quotes

626a566

Copyright block

e7a7229

ninsbl mentioned this pull request Apr 6, 2020

[Bug] r.info -e output line breaks not properly managed #491

Open

wenzeslaus requested changes Apr 7, 2020

View reviewed changes

vector/v.db.select/main.c Outdated Show resolved Hide resolved

vector/v.db.select/main.c Outdated Show resolved Hide resolved

vector/v.db.select/main.c Outdated Show resolved Hide resolved

vector/v.db.select/main.c Outdated Show resolved Hide resolved

HuidaeCho added 2 commits April 7, 2020 15:49

Minor refactoring (options and flags)

8672c93

JSON test

b69ffc2

HuidaeCho changed the title ~~v.db.select: Add -n flag (escape newlines) and -j flag (JSON output)~~ v.db.select: Add -e flag (escape newlines and backslashes) and -j flag (JSON output) Apr 7, 2020

HuidaeCho added a commit to HuidaeCho/grass that referenced this pull request Apr 7, 2020

v.db.select -n => -e (OSGeo#476)

7f68eb4

HuidaeCho requested a review from wenzeslaus April 20, 2020 01:56

Use a range for copyright years

58a9c25

wenzeslaus mentioned this pull request Apr 21, 2020

[Bug] Checks reports fail in PR for unrelated problem #533

Closed

wenzeslaus approved these changes Apr 21, 2020

View reviewed changes

Validate JSON output by loading it first

f493ac1

HuidaeCho merged commit 74dc1e4 into OSGeo:master Apr 21, 2020

HuidaeCho deleted the v_db_select_newline branch April 21, 2020 14:56

HuidaeCho added a commit that referenced this pull request Apr 21, 2020

v.report: Fix newline issue (#477)

b74dfd2

* v.report: Fix newline issue * v.db.select -n => -e (#476)

neteler added this to the 8.0.0 milestone Dec 9, 2021

cwhite911 mentioned this pull request Jun 5, 2023

Add JSON and YAML C library dependency #3020

Closed

46 tasks

Uh oh!

Comments

Conversation

HuidaeCho commented Apr 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

How to reproduce it in the NC sample dataset

What does this PR do?

Thoughts

Uh oh!

HuidaeCho commented Apr 4, 2020

Uh oh!

wenzeslaus commented Apr 5, 2020

Uh oh!

HuidaeCho commented Apr 5, 2020

Uh oh!

neteler commented Apr 5, 2020

Uh oh!

HuidaeCho commented Apr 5, 2020

Uh oh!

ninsbl commented Apr 5, 2020

Uh oh!

HuidaeCho commented Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuidaeCho commented Apr 5, 2020

Uh oh!

ninsbl commented Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuidaeCho commented Apr 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenzeslaus commented Apr 6, 2020

Uh oh!

HuidaeCho commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenzeslaus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuidaeCho commented Apr 7, 2020

Uh oh!

wenzeslaus commented Apr 21, 2020

Uh oh!

wenzeslaus left a comment

Choose a reason for hiding this comment

Uh oh!

HuidaeCho commented Apr 21, 2020

Uh oh!

HuidaeCho commented Apr 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuidaeCho commented Apr 2, 2020 •

edited

Loading

HuidaeCho commented Apr 5, 2020 •

edited

Loading

ninsbl commented Apr 5, 2020 •

edited

Loading

HuidaeCho commented Apr 5, 2020 •

edited

Loading

HuidaeCho commented Apr 6, 2020 •

edited

Loading