A linter to continuously check code quality in CI #4278

MichaelChirico · 2020-03-04T04:45:07Z

TODO:

renkun-ken · 2020-03-04T05:13:30Z

In languageserver, we use xmlparsedata to convert the parsed data to XML tree to analyze the code. Maybe it'll help here somehow?

codecov · 2020-03-05T02:24:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3bd4fd1) 97.46% compared to head (f8f8cbe) 97.46%.

❗ Current head f8f8cbe differs from pull request most recent head 96c8f9a. Consider uploading reports for the commit 96c8f9a to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #4278   +/-   ##
=======================================
  Coverage   97.46%   97.46%           
=======================================
  Files          80       80           
  Lines       14822    14822           
=======================================
  Hits        14447    14447           
  Misses        375      375

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jangorecki · 2020-03-05T02:46:32Z

inst/tests/tests.Rraw

  brackify = data.table:::brackify
  chmatchdup = data.table:::chmatchdup
  compactprint = data.table:::compactprint
+  CsubsetDT = data.table:::CsubsetDT


and this is caused by what?

https://github.com/Rdatatable/data.table/blob/master/.dev/CRAN_Release.cmd#L111-L112

I'm not quite sure the point of it, but if we're blocking it in the source we may as well block it in tests:

https://github.com/Rdatatable/data.table/pull/4278/files#diff-595e663fedb004ffced209a139805080L6411

make sense.
fyi @mattdowle

BTW IIUC this test is related to making sure we're using the C API correctly. If we're not registering our routines "properly" we'd have to refer to them as a string, and use package in the .Call call, I think the way we're doing it using dyn.load adds it to the symbol table so we can refer by symbol which is preferred. So the quotation check is about registering routines.

jangorecki · 2020-03-05T02:55:22Z

inst/tests/tests.Rraw

 test(458, DT[,sum(v),by=list(a%%2L)], data.table(a=c(1L,0L),V1=c(26L,13L)))
-test(459, DT[, list(sum(v)), list(ifelse(a == 2, NA, 1L))], data.table(ifelse=c(1L,NA_integer_),V1=c(26L,13L)))
-test(460, DT[, list(sum(v)), list(ifelse(a == 2, 1, NA))], data.table(ifelse=c(NA_real_,1),V1=c(26L,13L)))
+test(459, DT[, list(sum(v)), list(fifelse(a == 2L, NA_integer_, 1L))], data.table(fifelse=c(1L,NA_integer_),V1=c(26L,13L)))


I think we should leave ifelse and not replace it with fifelse, better to test against base R than our functions.

we do have a test (2085.33) matching fifelse to ifelse, I figured that is good enough. We could also add a few more tests of consistency of fifelse & ifelse...

Testing fifelse is a different thing. We do test grouping by same order, not fifelse really. I is fine to use fifelse here, but we should not remove every ifelse in favour of fifelse. I would remove ifelse checks from linter.

MichaelChirico · 2020-10-26T03:29:27Z

Really liking @renkun-ken's suggestion now that I'm quite comfortable working with XML versions of the parse trees from some work with lintr (which has a lot heavier dependency load). We already pull xml2 in CI so xmlparsedata would be very lightweight.

renkun-ken · 2020-10-26T03:32:46Z

@MichaelChirico Yes, exactly. The XML versions of the parse trees are much easier to work with.

MichaelChirico · 2020-10-26T05:02:19Z

@renkun-ken added you as a reviewer since you're already familiar with xmlparsedata.

@jangorecki where do you think the right place is to insert this to the CI pipeline?

Output of the current set of linters as of now:

-----------------
has_int_as_numeric found some issues
R/between.R:84:      0, c(FALSE, TRUE), 0L, "all", ops, verbose) # fix for #1819, turn on verbose messages

-----------------
has_int_as_numeric found some issues
R/data.table.R: 128:"[.data.table" = function (x, i, j, by, keyby, with=TRUE, nomatch=getOption("datatable.nomatch", NA), mult="all", roll=FALSE, rollends=if (roll=="nearest") c(TRUE,TRUE) else if (roll>=0) c(FALSE,TRUE) else c(TRUE,FALSE), which=FALSE, .SDcols, verbose=getOption("datatable.verbose"), allow.cartesian=getOption("datatable.allow.cartesian"), drop=NULL, on=NULL)
R/data.table.R:2920:    if (length(stub[[1L]]) != 1) return(NULL) # nocov Whatever it is, definitely not one of the valid operators

-----------------
has_int_as_numeric found some issues
R/foverlaps.R: 92:      2 ^ (bits + (getNumericRounding() * 8L))

-----------------
has_int_as_numeric found some issues
R/fread.R:269:  warnings2errors = getOption("warn") >= 2

-----------------
has_int_as_numeric found some issues
R/fwrite.R:  9:           buffMB=8, nThread=getDTthreads(verbose),

-----------------
has_int_as_numeric found some issues
R/IDateTime.R:190:  secs = 86400 * (unclass(x) %% 1)
R/IDateTime.R:190:  secs = 86400 * (unclass(x) %% 1)
R/IDateTime.R:247:                  hours = as.integer(round(unclass(x)/3600)*3600),
R/IDateTime.R:247:                  hours = as.integer(round(unclass(x)/3600)*3600),
R/IDateTime.R:248:                  minutes = as.integer(round(unclass(x)/60)*60)), 
R/IDateTime.R:248:                  minutes = as.integer(round(unclass(x)/60)*60)), 
R/IDateTime.R:255:                  hours = as.integer(unclass(x)%/%3600*3600),
R/IDateTime.R:255:                  hours = as.integer(unclass(x)%/%3600*3600),
R/IDateTime.R:256:                  minutes = as.integer(unclass(x)%/%60*60)), 
R/IDateTime.R:256:                  minutes = as.integer(unclass(x)%/%60*60)), 
R/IDateTime.R:289:as.POSIXct.IDate = function(x, tz = "UTC", time = 0, ...) {

-----------------
has_int_as_numeric found some issues
R/onLoad.R:117:  DF[2L, "b"] = 7  # changed b but not a

-----------------
has_int_as_numeric found some issues
R/openmp-utils.R: 4:    if (length(percent)!=1) stop("percent= is provided but is length ", length(percent))

-----------------
has_int_as_numeric found some issues
R/print.data.table.R:210:  rownum_width = if (row.names) as.integer(ceiling(log10(nrow(x)))+2) else 0L

-----------------
has_int_as_numeric found some issues
R/setkey.R:239:    orderArg = if (decreasing) -1 else 1
R/setkey.R:239:    orderArg = if (decreasing) -1 else 1

-----------------
has_int_as_numeric found some issues
R/setops.R:186:    if (between(tolerance, 0, sqrt(.Machine$double.eps), incbounds=FALSE)) {
R/setops.R:192:    tolerance.msg = if (identical(tolerance, 0)) ", be aware you are using `tolerance=0` which may result into visually equal data" else ""
R/setops.R:195:      if (any(vapply_1c(target,typeof)=="double") && !identical(tolerance, 0)) {
R/setops.R:202:          tolerance = 0
R/setops.R:212:    if (any(vapply_1b(target,is.factor)) && !identical(tolerance, 0)) {
R/setops.R:216:      tolerance = 0
R/setops.R:220:    if (!identical(tolerance, 0)) {
R/setops.R:222:        tolerance = 0
R/setops.R:233:    ans = if (identical(tolerance, 0)) target[current, nomatch=NA, which=TRUE, on=jn.on] else {
R/setops.R:243:    ans = if (identical(tolerance, 0)) current[target, nomatch=NA, which=TRUE, on=jn.on] else {

-----------------
has_int_as_numeric found some issues
R/tables.R: 4:tables = function(mb=TRUE, order.col="NAME", width=80,
R/tables.R:21:      MB = if (mb) round(as.numeric(object.size(DT))/1024^2), # object.size() is slow hence optional; TODO revisit
R/tables.R:21:      MB = if (mb) round(as.numeric(object.size(DT))/1024^2), # object.size() is slow hence optional; TODO revisit

-----------------
has_int_as_numeric found some issues
R/test.data.table.R:176:  cat("10 longest running tests took ", as.integer(tt<-DT[, sum(time)]), "s (", as.integer(100*tt/(ss<-timings[,sum(time)])), "% of ", as.integer(ss), "s)\n", sep="")
R/test.data.table.R:236:  c("PS_rss"=round(ans / 1024, 1L))

-----------------
has_quoted_Call found some issues
inst/tests/tests.Rraw: 6451:test(1459.01, .Call("CsubsetDT", DT, which(DT$a > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6452:test(1459.02, .Call("CsubsetDT", DT, which(DT$b > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6453:test(1459.03, .Call("CsubsetDT", DT, which(Re(DT$c) > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6454:test(1459.04, .Call("CsubsetDT", DT, which(DT$d > 2), seq_along(DT)), setDT(as.data.frame(DT)[3:4, , drop=FALSE]))
inst/tests/tests.Rraw: 6455:test(1459.05, .Call("CsubsetDT", DT, which(DT$f), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6456:test(1459.06, .Call("CsubsetDT", DT, which(DT$g == "c"), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6457:test(1459.07, .Call("CsubsetDT", DT, which(DT$a > 2 | is.na(DT$a)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6458:test(1459.08, .Call("CsubsetDT", DT, which(DT$b > 2 | is.na(DT$b)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6459:test(1459.09, .Call("CsubsetDT", DT, which(Re(DT$c) > 2 | is.na(DT$c)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6460:test(1459.10, .Call("CsubsetDT", DT, which(DT$f | is.na(DT$f)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6461:test(1459.11, .Call("CsubsetDT", DT, which(DT$g == "c" | is.na(DT$g)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6462:test(1459.12, .Call("CsubsetDT", DT, 5L, seq_along(DT)), setDT(as.data.frame(DT)[5,]))

-----------------
has_plain_T_F found some issues
inst/tests/tests.Rraw:16269:test(2100.14, fifelse(c(T,F,NA),c(1,1,1),c(2,2,2),NA), c(1,2,NA))
inst/tests/tests.Rraw:16269:test(2100.14, fifelse(c(T,F,NA),c(1,1,1),c(2,2,2),NA), c(1,2,NA))

-----------------
has_ifelse found some issues
inst/tests/tests.Rraw:  808:test(282, DT[, list(bal[ifelse(pool==1,1,0)], bal[1]), by=pool], data.table(pool=1:2, V1=c(10,NA), V2=c(10,30)))
inst/tests/tests.Rraw: 1365:test(459, DT[, list(sum(v)), list(ifelse(a == 2, NA, 1L))], data.table(ifelse=c(1L,NA_integer_),V1=c(26L,13L)))
inst/tests/tests.Rraw: 1366:test(460, DT[, list(sum(v)), list(ifelse(a == 2, 1, NA))], data.table(ifelse=c(NA_real_,1),V1=c(26L,13L)))
inst/tests/tests.Rraw: 7270:test(1529.10, between(x, 0.25, NA,   NAbounds=NA),     ifelse(x<=0.25, FALSE, NA))
inst/tests/tests.Rraw: 7271:test(1529.11, between(x, NA, 0.75,   NAbounds=NA),     ifelse(x>=0.75, FALSE, NA))
inst/tests/tests.Rraw: 7274:test(1529.14, between(x, x[3], NA, incbounds=FALSE, NAbounds=NA), ifelse(x<=x[3], FALSE, NA))
inst/tests/tests.Rraw: 7275:test(1529.15, between(x, x[3], NA, incbounds=TRUE, NAbounds=NA),  ifelse(x<x[3], FALSE, NA))
inst/tests/tests.Rraw: 7276:test(1529.16, between(x, NA, x[9], incbounds=FALSE, NAbounds=NA), ifelse(x>=x[9], FALSE, NA))
inst/tests/tests.Rraw: 7277:test(1529.17, between(x, NA, x[9], incbounds=TRUE, NAbounds=NA),  ifelse(x>x[9], FALSE, NA))
inst/tests/tests.Rraw:16105:test(2085.33, ifelse(c(a=TRUE,b=FALSE), c(1,2), c(11,12)), c(a=1, b=12)) # just to detect breaking change in base R

-----------------
has_system.time found some issues
inst/tests/tests.Rraw: 2299:test(819, system.time(X[Y,allow.cartesian=TRUE])["user.self"] < 10)   # this system.time usage ok in this case
inst/tests/tests.Rraw: 2300:test(820, system.time(X[Y,mult="first"])["user.self"] < 10)           # this system.time usage ok in this case

MichaelChirico · 2020-10-26T05:04:28Z

For something like inst/tests/tests.Rraw: 2299 which has comment # this system.time usage ok in this case it would be pretty easy to add a tag like nolint in the comment and drop any matched lint lines w the tag, sounds good?

renkun-ken · 2020-10-26T05:28:15Z

I like this approach to lint package with customized rules and I tried it and it works quite well.

For something like inst/tests/tests.Rraw: 2299 which has comment # this system.time usage ok in this case it would be pretty easy to add a tag like nolint in the comment and drop any matched lint lines w the tag, sounds good?

Yes, this might be the simplest way to allow nolint without changing the linters and xpaths.

jangorecki · 2020-10-26T07:20:05Z

nolint sounds ok, but in case of a line starting with comment sign we could make skip that somehow and nolint every line like this?
Best to have a single new job, I can plug that in to CI, just need a command to run that. Some extra script could be located in .ci dir if needed.

MichaelChirico · 2020-10-26T17:13:10Z

@jangorecki not sure I follow

in case of a line starting with comment sign

in XML parse tree, comments are given their own nodes, so they wouldn't be caught by any linter unless we targeted comments specificaly.

See

writeLines("
# a comment
3+3
", tmp <- tempfile())
writeLines(as.character(xml2::read_xml(
  xmlparsedata::xml_parse_data(parse(tmp))
)))

gives XML tree

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exprlist>
  <COMMENT line1="2" col1="1" line2="2" col2="11" start="25" end="35"># a comment</COMMENT>
  <expr line1="3" col1="1" line2="3" col2="3" start="37" end="39">
    <expr line1="3" col1="1" line2="3" col2="1" start="37" end="37">
      <NUM_CONST line1="3" col1="1" line2="3" col2="1" start="37" end="37">3</NUM_CONST>
    </expr>
    <OP-PLUS line1="3" col1="2" line2="3" col2="2" start="38" end="38">+</OP-PLUS>
    <expr line1="3" col1="3" line2="3" col2="3" start="39" end="39">
      <NUM_CONST line1="3" col1="3" line2="3" col2="3" start="39" end="39">3</NUM_CONST>
    </expr>
  </expr>
</exprlist>

jangorecki · 2023-12-14T14:23:22Z

Just don't try to fix what linter is saying as we will ended up having many new conflicts

MichaelChirico · 2023-12-14T15:53:15Z

Just don't try to fix what linter is saying as we will ended up having many new conflicts

Ah, too late. FWIW I kept it "minimal" and focused on low-volume/trivial fixes. How to proceed? (1) Split off all non-CI changes to separate PR(s) (2) Split off all R/* changes (3) Some more intermediate option (4) Merge as-is (after review)

jangorecki · 2023-12-14T17:25:56Z

Let's clear out PR queue in 1.15.99 and come back to this after

MichaelChirico · 2024-08-29T05:36:16Z

Closing this draft, we can add incremental PRs easily going forward. Track future work in #4190

MichaelChirico added the WIP label Mar 4, 2020

MichaelChirico force-pushed the ci-linter branch from 7f420f9 to b5ea92c Compare March 4, 2020 06:03

jangorecki reviewed Mar 5, 2020

View reviewed changes

MichaelChirico closed this Oct 26, 2020

MichaelChirico force-pushed the ci-linter branch from 8a2a333 to 63632e6 Compare October 26, 2020 04:55

overhauled linter

d145532

MichaelChirico reopened this Oct 26, 2020

MichaelChirico requested a review from renkun-ken October 26, 2020 04:58

MichaelChirico removed the WIP label Oct 26, 2020

MichaelChirico added the WIP label Oct 26, 2020

MichaelChirico mentioned this pull request Nov 9, 2020

Write a linter for R-side things in CRAN_release #4190

Open

MichaelChirico mentioned this pull request Dec 10, 2020

Add implicit_integer_linter to our linters? r-lib/lintr#699

Closed

MichaelChirico mentioned this pull request May 6, 2021

improve error message #4937

Merged

This was referenced May 20, 2021

Use anyDuplicated as appropriate #5015

Merged

Flip any(!x) to !all(x) and all(!x) to !any(x) #5017

Merged

MichaelChirico mentioned this pull request Jul 8, 2021

fully migrate to templated messages #5068

Merged

Merge branch 'master' into ci-linter

2a5e8a7

MichaelChirico requested a review from mattdowle as a code owner December 14, 2023 13:07

Initial commit of {lintr} approach

f8f8cbe

MichaelChirico added 5 commits December 14, 2023 15:48

first pass at personalization

af21337

first custom linter

d38bf19

delint vignettes

e80bda7

delint tests

d0178d0

delint R sources

96c8f9a

MichaelChirico requested a review from tdhock as a code owner December 14, 2023 15:49

MichaelChirico mentioned this pull request Dec 14, 2023

add GOVERNANCE.md document #5772

Merged

MichaelChirico marked this pull request as draft December 15, 2023 02:49

MichaelChirico mentioned this pull request Jan 14, 2024

Add a GHA for linting code #5908

Merged

MichaelChirico removed the WIP label Feb 19, 2024

MichaelChirico closed this Aug 29, 2024

MichaelChirico deleted the ci-linter branch August 29, 2024 05:36

A linter to continuously check code quality in CI #4278

A linter to continuously check code quality in CI #4278

Uh oh!

Conversation

MichaelChirico commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renkun-ken commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jangorecki Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

jangorecki Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Oct 26, 2020

Choose a reason for hiding this comment

Uh oh!

jangorecki Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

jangorecki Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

MichaelChirico commented Oct 26, 2020

Uh oh!

renkun-ken commented Oct 26, 2020

Uh oh!

MichaelChirico commented Oct 26, 2020

Uh oh!

MichaelChirico commented Oct 26, 2020

Uh oh!

renkun-ken commented Oct 26, 2020

Uh oh!

jangorecki commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelChirico commented Oct 26, 2020

Uh oh!

jangorecki commented Dec 14, 2023

Uh oh!

MichaelChirico commented Dec 14, 2023

Uh oh!

jangorecki commented Dec 14, 2023

Uh oh!

MichaelChirico commented Aug 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MichaelChirico commented Mar 4, 2020 •

edited

Loading

renkun-ken commented Mar 4, 2020 •

edited

Loading

codecov bot commented Mar 5, 2020 •

edited

Loading

jangorecki commented Oct 26, 2020 •

edited

Loading