Skip to content

Conversation

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Mar 4, 2020

Closes #4190

TODO:

  • linter for gsub/grep/etc calls that can use fixed=TRUE
  • linter for paste calls using sep=""
  • Style Guide Lint: Use = for assignment
  • Style Guide Lint: 2-space indentation (no tabs anywhere)
  • Style Guide Lint: No trailing whitespace
  • Style Guide Lint: No space around = in function calls
  • Style Guide Lint: Spacing around if: if (condition) {
  • New lint: don't quote keywords unless required (see e.g. internal cleanup -- new helper clip_msec #4924)
  • New lint: Proper use of && / || where possible
  • New lint: Block paste(x, collapse=',') in favor of brackify(x)
  • New lints: any(is.na())/any(is.duplicated())
  • New lint: Negate after aggregation: any(!x) -> !all(x), all(!x) -> !any(x)
  • New lint: stop()/warning()/message() --> stopf()/warningf()/messagef()
  • New lint: test(SYMBOL, ...) or test(SYMBOL + ....) linted (see Changes to ensure test(number instead of test(name #6041 for lint logic)
  • Incorporate # nolint logic
  • Include greps of src (still as regex, from R)
  • New src regex: ++i instead of i++ in all loops
  • Apply lints to existing code
  • Add to CI

@renkun-ken
Copy link
Member

renkun-ken commented Mar 4, 2020

In languageserver, we use xmlparsedata to convert the parsed data to XML tree to analyze the code. Maybe it'll help here somehow?

@codecov
Copy link

codecov bot commented Mar 5, 2020

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3bd4fd1) 97.46% compared to head (f8f8cbe) 97.46%.

❗ Current head f8f8cbe differs from pull request most recent head 96c8f9a. Consider uploading reports for the commit 96c8f9a to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4278   +/-   ##
=======================================
  Coverage   97.46%   97.46%           
=======================================
  Files          80       80           
  Lines       14822    14822           
=======================================
  Hits        14447    14447           
  Misses        375      375           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

brackify = data.table:::brackify
chmatchdup = data.table:::chmatchdup
compactprint = data.table:::compactprint
CsubsetDT = data.table:::CsubsetDT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is caused by what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.
fyi @mattdowle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW IIUC this test is related to making sure we're using the C API correctly. If we're not registering our routines "properly" we'd have to refer to them as a string, and use package in the .Call call, I think the way we're doing it using dyn.load adds it to the symbol table so we can refer by symbol which is preferred. So the quotation check is about registering routines.

test(458, DT[,sum(v),by=list(a%%2L)], data.table(a=c(1L,0L),V1=c(26L,13L)))
test(459, DT[, list(sum(v)), list(ifelse(a == 2, NA, 1L))], data.table(ifelse=c(1L,NA_integer_),V1=c(26L,13L)))
test(460, DT[, list(sum(v)), list(ifelse(a == 2, 1, NA))], data.table(ifelse=c(NA_real_,1),V1=c(26L,13L)))
test(459, DT[, list(sum(v)), list(fifelse(a == 2L, NA_integer_, 1L))], data.table(fifelse=c(1L,NA_integer_),V1=c(26L,13L)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave ifelse and not replace it with fifelse, better to test against base R than our functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have a test (2085.33) matching fifelse to ifelse, I figured that is good enough. We could also add a few more tests of consistency of fifelse & ifelse...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing fifelse is a different thing. We do test grouping by same order, not fifelse really. I is fine to use fifelse here, but we should not remove every ifelse in favour of fifelse. I would remove ifelse checks from linter.

@MichaelChirico
Copy link
Member Author

Really liking @renkun-ken's suggestion now that I'm quite comfortable working with XML versions of the parse trees from some work with lintr (which has a lot heavier dependency load). We already pull xml2 in CI so xmlparsedata would be very lightweight.

@renkun-ken
Copy link
Member

@MichaelChirico Yes, exactly. The XML versions of the parse trees are much easier to work with.

@MichaelChirico
Copy link
Member Author

@renkun-ken added you as a reviewer since you're already familiar with xmlparsedata.

@jangorecki where do you think the right place is to insert this to the CI pipeline?

Output of the current set of linters as of now:

-----------------
has_int_as_numeric found some issues
R/between.R:84:      0, c(FALSE, TRUE), 0L, "all", ops, verbose) # fix for #1819, turn on verbose messages

-----------------
has_int_as_numeric found some issues
R/data.table.R: 128:"[.data.table" = function (x, i, j, by, keyby, with=TRUE, nomatch=getOption("datatable.nomatch", NA), mult="all", roll=FALSE, rollends=if (roll=="nearest") c(TRUE,TRUE) else if (roll>=0) c(FALSE,TRUE) else c(TRUE,FALSE), which=FALSE, .SDcols, verbose=getOption("datatable.verbose"), allow.cartesian=getOption("datatable.allow.cartesian"), drop=NULL, on=NULL)
R/data.table.R:2920:    if (length(stub[[1L]]) != 1) return(NULL) # nocov Whatever it is, definitely not one of the valid operators

-----------------
has_int_as_numeric found some issues
R/foverlaps.R: 92:      2 ^ (bits + (getNumericRounding() * 8L))

-----------------
has_int_as_numeric found some issues
R/fread.R:269:  warnings2errors = getOption("warn") >= 2

-----------------
has_int_as_numeric found some issues
R/fwrite.R:  9:           buffMB=8, nThread=getDTthreads(verbose),

-----------------
has_int_as_numeric found some issues
R/IDateTime.R:190:  secs = 86400 * (unclass(x) %% 1)
R/IDateTime.R:190:  secs = 86400 * (unclass(x) %% 1)
R/IDateTime.R:247:                  hours = as.integer(round(unclass(x)/3600)*3600),
R/IDateTime.R:247:                  hours = as.integer(round(unclass(x)/3600)*3600),
R/IDateTime.R:248:                  minutes = as.integer(round(unclass(x)/60)*60)), 
R/IDateTime.R:248:                  minutes = as.integer(round(unclass(x)/60)*60)), 
R/IDateTime.R:255:                  hours = as.integer(unclass(x)%/%3600*3600),
R/IDateTime.R:255:                  hours = as.integer(unclass(x)%/%3600*3600),
R/IDateTime.R:256:                  minutes = as.integer(unclass(x)%/%60*60)), 
R/IDateTime.R:256:                  minutes = as.integer(unclass(x)%/%60*60)), 
R/IDateTime.R:289:as.POSIXct.IDate = function(x, tz = "UTC", time = 0, ...) {

-----------------
has_int_as_numeric found some issues
R/onLoad.R:117:  DF[2L, "b"] = 7  # changed b but not a

-----------------
has_int_as_numeric found some issues
R/openmp-utils.R: 4:    if (length(percent)!=1) stop("percent= is provided but is length ", length(percent))

-----------------
has_int_as_numeric found some issues
R/print.data.table.R:210:  rownum_width = if (row.names) as.integer(ceiling(log10(nrow(x)))+2) else 0L

-----------------
has_int_as_numeric found some issues
R/setkey.R:239:    orderArg = if (decreasing) -1 else 1
R/setkey.R:239:    orderArg = if (decreasing) -1 else 1

-----------------
has_int_as_numeric found some issues
R/setops.R:186:    if (between(tolerance, 0, sqrt(.Machine$double.eps), incbounds=FALSE)) {
R/setops.R:192:    tolerance.msg = if (identical(tolerance, 0)) ", be aware you are using `tolerance=0` which may result into visually equal data" else ""
R/setops.R:195:      if (any(vapply_1c(target,typeof)=="double") && !identical(tolerance, 0)) {
R/setops.R:202:          tolerance = 0
R/setops.R:212:    if (any(vapply_1b(target,is.factor)) && !identical(tolerance, 0)) {
R/setops.R:216:      tolerance = 0
R/setops.R:220:    if (!identical(tolerance, 0)) {
R/setops.R:222:        tolerance = 0
R/setops.R:233:    ans = if (identical(tolerance, 0)) target[current, nomatch=NA, which=TRUE, on=jn.on] else {
R/setops.R:243:    ans = if (identical(tolerance, 0)) current[target, nomatch=NA, which=TRUE, on=jn.on] else {

-----------------
has_int_as_numeric found some issues
R/tables.R: 4:tables = function(mb=TRUE, order.col="NAME", width=80,
R/tables.R:21:      MB = if (mb) round(as.numeric(object.size(DT))/1024^2), # object.size() is slow hence optional; TODO revisit
R/tables.R:21:      MB = if (mb) round(as.numeric(object.size(DT))/1024^2), # object.size() is slow hence optional; TODO revisit

-----------------
has_int_as_numeric found some issues
R/test.data.table.R:176:  cat("10 longest running tests took ", as.integer(tt<-DT[, sum(time)]), "s (", as.integer(100*tt/(ss<-timings[,sum(time)])), "% of ", as.integer(ss), "s)\n", sep="")
R/test.data.table.R:236:  c("PS_rss"=round(ans / 1024, 1L))

-----------------
has_quoted_Call found some issues
inst/tests/tests.Rraw: 6451:test(1459.01, .Call("CsubsetDT", DT, which(DT$a > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6452:test(1459.02, .Call("CsubsetDT", DT, which(DT$b > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6453:test(1459.03, .Call("CsubsetDT", DT, which(Re(DT$c) > 2), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6454:test(1459.04, .Call("CsubsetDT", DT, which(DT$d > 2), seq_along(DT)), setDT(as.data.frame(DT)[3:4, , drop=FALSE]))
inst/tests/tests.Rraw: 6455:test(1459.05, .Call("CsubsetDT", DT, which(DT$f), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6456:test(1459.06, .Call("CsubsetDT", DT, which(DT$g == "c"), seq_along(DT)), setDT(as.data.frame(DT)[3, , drop=FALSE]))
inst/tests/tests.Rraw: 6457:test(1459.07, .Call("CsubsetDT", DT, which(DT$a > 2 | is.na(DT$a)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6458:test(1459.08, .Call("CsubsetDT", DT, which(DT$b > 2 | is.na(DT$b)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6459:test(1459.09, .Call("CsubsetDT", DT, which(Re(DT$c) > 2 | is.na(DT$c)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6460:test(1459.10, .Call("CsubsetDT", DT, which(DT$f | is.na(DT$f)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6461:test(1459.11, .Call("CsubsetDT", DT, which(DT$g == "c" | is.na(DT$g)), seq_along(DT)), setDT(as.data.frame(DT)[3:4,]))
inst/tests/tests.Rraw: 6462:test(1459.12, .Call("CsubsetDT", DT, 5L, seq_along(DT)), setDT(as.data.frame(DT)[5,]))

-----------------
has_plain_T_F found some issues
inst/tests/tests.Rraw:16269:test(2100.14, fifelse(c(T,F,NA),c(1,1,1),c(2,2,2),NA), c(1,2,NA))
inst/tests/tests.Rraw:16269:test(2100.14, fifelse(c(T,F,NA),c(1,1,1),c(2,2,2),NA), c(1,2,NA))

-----------------
has_ifelse found some issues
inst/tests/tests.Rraw:  808:test(282, DT[, list(bal[ifelse(pool==1,1,0)], bal[1]), by=pool], data.table(pool=1:2, V1=c(10,NA), V2=c(10,30)))
inst/tests/tests.Rraw: 1365:test(459, DT[, list(sum(v)), list(ifelse(a == 2, NA, 1L))], data.table(ifelse=c(1L,NA_integer_),V1=c(26L,13L)))
inst/tests/tests.Rraw: 1366:test(460, DT[, list(sum(v)), list(ifelse(a == 2, 1, NA))], data.table(ifelse=c(NA_real_,1),V1=c(26L,13L)))
inst/tests/tests.Rraw: 7270:test(1529.10, between(x, 0.25, NA,   NAbounds=NA),     ifelse(x<=0.25, FALSE, NA))
inst/tests/tests.Rraw: 7271:test(1529.11, between(x, NA, 0.75,   NAbounds=NA),     ifelse(x>=0.75, FALSE, NA))
inst/tests/tests.Rraw: 7274:test(1529.14, between(x, x[3], NA, incbounds=FALSE, NAbounds=NA), ifelse(x<=x[3], FALSE, NA))
inst/tests/tests.Rraw: 7275:test(1529.15, between(x, x[3], NA, incbounds=TRUE, NAbounds=NA),  ifelse(x<x[3], FALSE, NA))
inst/tests/tests.Rraw: 7276:test(1529.16, between(x, NA, x[9], incbounds=FALSE, NAbounds=NA), ifelse(x>=x[9], FALSE, NA))
inst/tests/tests.Rraw: 7277:test(1529.17, between(x, NA, x[9], incbounds=TRUE, NAbounds=NA),  ifelse(x>x[9], FALSE, NA))
inst/tests/tests.Rraw:16105:test(2085.33, ifelse(c(a=TRUE,b=FALSE), c(1,2), c(11,12)), c(a=1, b=12)) # just to detect breaking change in base R

-----------------
has_system.time found some issues
inst/tests/tests.Rraw: 2299:test(819, system.time(X[Y,allow.cartesian=TRUE])["user.self"] < 10)   # this system.time usage ok in this case
inst/tests/tests.Rraw: 2300:test(820, system.time(X[Y,mult="first"])["user.self"] < 10)           # this system.time usage ok in this case

@MichaelChirico
Copy link
Member Author

For something like inst/tests/tests.Rraw: 2299 which has comment # this system.time usage ok in this case it would be pretty easy to add a tag like nolint in the comment and drop any matched lint lines w the tag, sounds good?

@renkun-ken
Copy link
Member

I like this approach to lint package with customized rules and I tried it and it works quite well.

For something like inst/tests/tests.Rraw: 2299 which has comment # this system.time usage ok in this case it would be pretty easy to add a tag like nolint in the comment and drop any matched lint lines w the tag, sounds good?

Yes, this might be the simplest way to allow nolint without changing the linters and xpaths.

@jangorecki
Copy link
Member

jangorecki commented Oct 26, 2020

nolint sounds ok, but in case of a line starting with comment sign we could make skip that somehow and nolint every line like this?
Best to have a single new job, I can plug that in to CI, just need a command to run that. Some extra script could be located in .ci dir if needed.

@MichaelChirico
Copy link
Member Author

@jangorecki not sure I follow

in case of a line starting with comment sign

in XML parse tree, comments are given their own nodes, so they wouldn't be caught by any linter unless we targeted comments specificaly.

See

writeLines("
# a comment
3+3
", tmp <- tempfile())
writeLines(as.character(xml2::read_xml(
  xmlparsedata::xml_parse_data(parse(tmp))
)))

gives XML tree

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<exprlist>
  <COMMENT line1="2" col1="1" line2="2" col2="11" start="25" end="35"># a comment</COMMENT>
  <expr line1="3" col1="1" line2="3" col2="3" start="37" end="39">
    <expr line1="3" col1="1" line2="3" col2="1" start="37" end="37">
      <NUM_CONST line1="3" col1="1" line2="3" col2="1" start="37" end="37">3</NUM_CONST>
    </expr>
    <OP-PLUS line1="3" col1="2" line2="3" col2="2" start="38" end="38">+</OP-PLUS>
    <expr line1="3" col1="3" line2="3" col2="3" start="39" end="39">
      <NUM_CONST line1="3" col1="3" line2="3" col2="3" start="39" end="39">3</NUM_CONST>
    </expr>
  </expr>
</exprlist>

@jangorecki
Copy link
Member

Just don't try to fix what linter is saying as we will ended up having many new conflicts

@MichaelChirico
Copy link
Member Author

Just don't try to fix what linter is saying as we will ended up having many new conflicts

Ah, too late. FWIW I kept it "minimal" and focused on low-volume/trivial fixes. How to proceed? (1) Split off all non-CI changes to separate PR(s) (2) Split off all R/* changes (3) Some more intermediate option (4) Merge as-is (after review)

@jangorecki
Copy link
Member

Let's clear out PR queue in 1.15.99 and come back to this after

@MichaelChirico MichaelChirico marked this pull request as draft December 15, 2023 02:49
@MichaelChirico
Copy link
Member Author

Closing this draft, we can add incremental PRs easily going forward. Track future work in #4190

@MichaelChirico MichaelChirico deleted the ci-linter branch August 29, 2024 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write a linter for R-side things in CRAN_release

3 participants