Skip to content

Comments

v.db.select: Add -e flag (escape newlines and backslashes) and -j flag (JSON output)#476

Merged
HuidaeCho merged 9 commits intoOSGeo:masterfrom
HuidaeCho:v_db_select_newline
Apr 21, 2020
Merged

v.db.select: Add -e flag (escape newlines and backslashes) and -j flag (JSON output)#476
HuidaeCho merged 9 commits intoOSGeo:masterfrom
HuidaeCho:v_db_select_newline

Conversation

@HuidaeCho
Copy link
Member

@HuidaeCho HuidaeCho commented Apr 2, 2020

Issue

If there are newline characters in a column, v.db.select will print multiple lines per record, which breaks v.report (#477). I found this issue in the GADM 3.6 dataset.

How to reproduce it in the NC sample dataset

g.copy --o vect=boundary_county,tmp
db.execute sql="update tmp set name='BREAK' || char(10) || 'ME\nPLEASE' where cat=1"
v.db.select -c map=tmp column=name where=cat=1

will print two lines:

BREAK
ME\nPLEASE

What does this PR do?

v.db.select -ce map=tmp column=name where=cat=1

will print

BREAK\nME\\nPLEASE

Thoughts

Maybe, we should make -n flag default?

@HuidaeCho HuidaeCho added the enhancement New feature or request label Apr 2, 2020
@HuidaeCho HuidaeCho changed the title v.db.select: Add -n flag to avoid multiple lines per record WIP: v.db.select: Add -n flag to avoid multiple lines per record Apr 4, 2020
@HuidaeCho HuidaeCho changed the title WIP: v.db.select: Add -n flag to avoid multiple lines per record v.db.select: Add -n flag to avoid multiple lines per record Apr 4, 2020
@HuidaeCho
Copy link
Member Author

BREAK
ME\nPLEASE

vs.

BREAK\nME\\nPLEASE

I think because of escaped backslashes and rare (hopefully) cases of newlines in records, it's better to keep the -n flag as is instead of making this behavior default.

@wenzeslaus
Copy link
Member

Hard to say, but I'm saying yes to -n flag as you suggested because the current format is closest to CSV which (usually) does not escape newlines. Although once you have the possibility of newlines characters in your data you will most likely need to use -n which is your motivation for all this.

Why this is hard to decide or to do is, I think, caused by a more general issue and that is lack of definition for these output formats (of GRASS modules); similarly to the discussion in #325. Perhaps v.report is for viewing rather than subsequent parsing, then escaping new lines by default would make sense, but escaping backslashes would not. If it is for querying, a commonly used format, even if it is just a well-defined CSV, would be needed for smooth subsequent parsing. That's a decision we have already made for complex output of v.what. See how this nicely plays out in this random example of parsing its output with a standard tools (here JSON reading in Ruby and Python):

$ grass ~/grassdata/nc_spm/user1/ \
      --exec v.what zipcodes_wake coordinates=637502.25,221744.25 -j | \
      ruby -ryaml -rjson -e 'puts YAML.dump(JSON.parse(STDIN.read))'
---
Coordinates:
  East: '637502.25'
  North: '221744.25'
Maps:
- Map: zipcodes_wake
  Mapset: PERMANENT
  Type: Area
  Sq_Meters: 130875884.223
  Hectares: 13087.588
  Acres: 32340.135
  Sq_Miles: 50.5315
  Categories:
  - Layer: 1
    Category: 40
$ # similar result also with
$ ... python3 -c "import json, yaml, sys; print(yaml.dump(json.load(sys.stdin)))"

@HuidaeCho
Copy link
Member Author

Hard to say, but I'm saying yes to -n flag as you suggested because the current format is closest to CSV which (usually) does not escape newlines. Although once you have the possibility of newlines characters in your data you will most likely need to use -n which is your motivation for all this.

I agree to keep the default behavior.

Why this is hard to decide or to do is, I think, caused by a more general issue and that is lack of definition for these output formats (of GRASS modules); similarly to the discussion in #325. Perhaps v.report is for viewing rather than subsequent parsing, then escaping new lines by default would make sense, but escaping backslashes would not. If it is for querying, a commonly used format, even if it is just a well-defined CSV, would be needed for smooth subsequent parsing.

Escaping only newlines is problematic as shown in the above example (\ + n vs. \n). v.report could unescape newlines and backslashes after generating the report, but, again, I don't really think that many users would expect multiple lines per record from both v.db.select and v.report. I wouldn't have found this issue myself if I didn't decide to run v.report on the GADM data. This command line unescapes escaped characters.

v.report map=tmp option=area | sed 's/\\\\/\x0/g; s/\\n/\n/g; s/\x0/\\/g'

That's a decision we have already made for complex output of v.what. See how this nicely plays out in this random example of parsing its output with a standard tools (here JSON reading in Ruby and Python):

$ grass ~/grassdata/nc_spm/user1/ \
      --exec v.what zipcodes_wake coordinates=637502.25,221744.25 -j | \
      ruby -ryaml -rjson -e 'puts YAML.dump(JSON.parse(STDIN.read))'
---
Coordinates:
  East: '637502.25'
  North: '221744.25'
Maps:
- Map: zipcodes_wake
  Mapset: PERMANENT
  Type: Area
  Sq_Meters: 130875884.223
  Hectares: 13087.588
  Acres: 32340.135
  Sq_Miles: 50.5315
  Categories:
  - Layer: 1
    Category: 40
$ # similar result also with
$ ... python3 -c "import json, yaml, sys; print(yaml.dump(json.load(sys.stdin)))"

IMO, JSON is less human-readable, but it would be great to implement the JSON format in v.db.select for parsing.

@neteler
Copy link
Member

neteler commented Apr 5, 2020

Note: JSON support is currently being worked on in OSGeo/grass-addons#137 (in Python, though).

@HuidaeCho
Copy link
Member Author

For our record, the pipe character (|) is not as bad as newlines (not breaking v.report):

db.execute sql="update tmp set name='BREAK|ME AGAIN' where cat=1"

@ninsbl
Copy link
Member

ninsbl commented Apr 5, 2020

Hi Huidae, while you are at the newline issue, would you mind looking at r.info as well? See:
https://www.mail-archive.com/[email protected]/msg59103.html
If metadata contains linebrakes, output of r.info -e can no longer be parsed...

@HuidaeCho
Copy link
Member Author

HuidaeCho commented Apr 5, 2020

Note: JSON support is currently being worked on in OSGeo/grass-addons#137 (in Python, though).

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well. Anyway, this PR now includes -n (newline and backslash escaping) and -j (JSON output) flags (mutually exclusive). The -j flag always escapes newlines and backslashes as per JSON specs (https://www.json.org/).

g.copy --o vect=boundary_county,tmp
db.execute sql="update tmp set name='BREAK' || char(10) || '"'"'"ME''\nPLEASE' where cat=1"
v.db.select -j map=tmp |
python3 -c 'import sys, json; recs=json.loads(sys.stdin.read()); print(recs[0]["NAME"])'

outputs

BREAK
"ME'\nPLEASE

@HuidaeCho HuidaeCho changed the title v.db.select: Add -n flag to avoid multiple lines per record v.db.select: Add -n flag to avoid multiple lines per record and -j flag to output JSON Apr 5, 2020
@HuidaeCho HuidaeCho changed the title v.db.select: Add -n flag to avoid multiple lines per record and -j flag to output JSON v.db.select: Add -n flag (escape newlines) and -j flag (JSON output) Apr 5, 2020
@HuidaeCho
Copy link
Member Author

Hi Huidae, while you are at the newline issue, would you mind looking at r.info as well? See:
https://www.mail-archive.com/[email protected]/msg59103.html
If metadata contains linebrakes, output of r.info -e can no longer be parsed...

Hi Stefan, How can I reproduce it in the NC sample dataset?

@ninsbl
Copy link
Member

ninsbl commented Apr 5, 2020

Hi Stefan, How can I reproduce it in the NC sample dataset?

You could do:

# Add two lines to history
r.support elevation history=BREAK
r.support elevation history=ME

# Print extended info in key-value form (no line breaks)
r.info -e elevation
# Print history (with line breaks)
r.info -h elevation

@HuidaeCho
Copy link
Member Author

HuidaeCho commented Apr 5, 2020

r.support

comments="r.proj input="ned03arcsec" location="northcarolina_latlong" mapset="\helena" output="elev_ned10m" method="cubic" resolution=10BREAKME"

\ is added in the middle and even double quotes are not properly escaped.

echo $comments
r.proj input=ned03arcsec location=northcarolina_latlong mapset=helena output=elev_ned10m method=cubic resolution=10BREAKME

You want to create a new issue ticket?

@wenzeslaus
Copy link
Member

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well.

Just create the new branch from master, not from the branch for this PR:

git checkout master
git checkout -b new-branch-name

@HuidaeCho
Copy link
Member Author

HuidaeCho commented Apr 6, 2020

Not sure how I can create a separate PR that depends on this one, which is not committed yet. I tried to create a new PR, but that contained this commit as well.

Just create the new branch from master, not from the branch for this PR:

git checkout master
git checkout -b new-branch-name

Wouldn't that create conflicts?
PR 1: this
PR 2: that (without this from PR1)

Can we merge both PRs with no conflicts? or PR 1 first, then PR 2?

Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

@HuidaeCho HuidaeCho changed the title v.db.select: Add -n flag (escape newlines) and -j flag (JSON output) v.db.select: Add -e flag (escape newlines and backslashes) and -j flag (JSON output) Apr 7, 2020
@HuidaeCho
Copy link
Member Author

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

HuidaeCho added a commit to HuidaeCho/grass that referenced this pull request Apr 7, 2020
@HuidaeCho HuidaeCho requested a review from wenzeslaus April 20, 2020 01:56
@wenzeslaus
Copy link
Member

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you did with the rest of the code. Thanks.

@HuidaeCho
Copy link
Member Author

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

Yes, I thought about that. Just loading the output to validate the JSON format only, but the same test script does depend on the same data set. That's why I decided to check the validity of the data as well. Let me do something about this.

@HuidaeCho
Copy link
Member Author

I'm sorry to request more after suggesting the JSON, but since the JSON is generated manually, it would be good to at least have a test which checks that the generated text is a valid JSON by loading it as JSON into Python.

I think you're talking about testsuite/test_v_db_select.py? Please check b69ffc2

What is there is great. Additional test which would be worth it (and what I meant originally) is simply loading it into Python using json package (json.load(...)) something like test_json_valid(). It does not depend on the data and json package is part of Python standard library, so the cost-gain ratio is high although it obviously tests only the validity.

Yes, I thought about that. Just loading the output to validate the JSON format only, but the same test script does depend on the same data set. That's why I decided to check the validity of the data as well. Let me do something about this.

Done in f493ac1

@HuidaeCho HuidaeCho merged commit 74dc1e4 into OSGeo:master Apr 21, 2020
@HuidaeCho HuidaeCho deleted the v_db_select_newline branch April 21, 2020 14:56
HuidaeCho added a commit that referenced this pull request Apr 21, 2020
* v.report: Fix newline issue

* v.db.select -n => -e (#476)
@neteler neteler added this to the 8.0.0 milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants