Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Feb 22, 2019

edit: marked WIP as I need to ensure unit tests don't fail first.

The develop branch is not PEP8 compliant (see #624). This fixes that for the core module*, ignoring E402 as per the script.
Examples and unit tests are excluded from this PR**, for the following reasons:

  • I have to set up documentation tools to see how and if documentation would be impacted by PEP8 changes.
  • I hope that we can have a discussion regarding the 80-character line limit.
  • Keeping the PR close to HEAD. For two reasons above, and a lot of test files also violating PEP8, including tests and documentation can incur serious delays. I would like to avoid another months-long PR if reasonable.

In/around code that needed to be changed for PEP8 reasons, I sparingly also refactored and fixed a bug. Most notably:

  • openml.tasks.split.OpenMLSplit.__eq__ featured a bug where they were only not equal if all elements differed, rather than any (here)
  • openml.flows.functions._get_fn_arguments_with_defaults more pythonic and removed Py2 support in the process

When adapting the PEP8 check, has there been any discussion on the 80-character limit?
I personally think that a 100 or 120 character limit is typically better.
The 80 character limit limits reasonably verbose variable names, especially at higher indent levels.
Examples in the code that I think would be fine to be on one line:

  • 1 verbose names cause exceeding the limit by 3 characters.
  • 2 a simple operation, again too verbose because of the string.
  • 3 documentation line. Possibly particularly problematic (how would documentation interact with #noqa flake8 exceptions?).
  • 4 5 multi-level indexing with variables as indices quickly creates long lines that are still easy to understand.

If developers try to avoid excessively verbose names or chaining functions, I think these are fine exceptions. Modern screens can easily show two 100 or 120 character lines side-by-side. Limiting to 80 characters leads to scenarios that (in my opinion) hurt the legibility more like splitting lines with square brackets as seen in 4.

* There is one exception utils#100 violating the character limit (85>80). This again has to do with that I do not know how documentation will be affected by breaking up the change.

** this is the exception that might lead to visible changes. After I realized my work might impact the final rendering of documentation, I stopped. I could revert this change if desired.

@PGijsbers PGijsbers changed the title Fix624 pep8 [WIP] Fix624 pep8 Feb 22, 2019
@mfeurer
Copy link
Collaborator

mfeurer commented Feb 22, 2019

I'd be happy to merge that, and I also start too see why more than 80 characters per line might be a good idea. How about 100 and we see if this goes well?

@janvanrijn any objections on solving all PEP8 issues at once?

@PGijsbers I think it would be most compelling if you could also change the flake8 checking script so that it checks the whole library at once.

@PGijsbers
Copy link
Collaborator Author

Invoking flake8 on all code instead of just diffs seems easy enough. Considering it's a quick operation anyway even for the whole repo, I don't know/think that it's bad practice (especially since you don't need to sort through diffs).
Changing the max line length is also easy if we can agree on it (100 as a start is fine for me).

@codecov-io
Copy link

codecov-io commented Feb 22, 2019

Codecov Report

Merging #625 into develop will increase coverage by 0.19%.
The diff coverage is 84%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #625      +/-   ##
===========================================
+ Coverage    89.77%   89.96%   +0.19%     
===========================================
  Files           32       32              
  Lines         3207     3208       +1     
===========================================
+ Hits          2879     2886       +7     
+ Misses         328      322       -6
Impacted Files Coverage Δ
openml/tasks/__init__.py 100% <ø> (ø) ⬆️
openml/flows/__init__.py 100% <ø> (ø) ⬆️
openml/study/__init__.py 100% <ø> (ø) ⬆️
openml/evaluations/evaluation.py 100% <ø> (ø) ⬆️
openml/exceptions.py 100% <ø> (ø) ⬆️
openml/tasks/functions.py 88.38% <ø> (ø) ⬆️
openml/study/functions.py 95.83% <ø> (ø) ⬆️
openml/testing.py 93.82% <100%> (-0.15%) ⬇️
openml/flows/functions.py 93.25% <100%> (ø) ⬆️
openml/__init__.py 100% <100%> (ø) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2a4ade...792adb2. Read the comment docs.

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Feb 22, 2019

Not sure why the previous tests failed - the changed code only had one minor refactor but might have introduced off-by-one errors (I don't see it)? Even so, the actual numbers are also different in pr and push.

For the particular test it should be that in both cases the most significantly changed line is not even hit. The first call LIMIT is not None evaluates to False, while in the second call it evaluates to True but BATCH_SIZE_ORIG > LIMIT evaluates to False.

@janvanrijn
Copy link
Member

This is really cool stuff @PGijsbers , thanks for taking care of this.

I'd be happy to merge that, and I also start too see why more than 80 characters per line might be a good idea. How about 100 and we see if this goes well?

I usually do 120, this seems the default for PyCharm. However keep in mind that for consistency with sklearn 80 is not a very bad idea. (Fine with both)

@PGijsbers
Copy link
Collaborator Author

Flake seems to be failing on warnings, do we want this behavior?
This brings up another issue I noticed (but did not think was very important right now), in that there are two different incompatible warnings: W503 and W504 which forbid a linebreak before and after an operator, respectively (see this).
Do we want to enforce one or the other?
I personally feel that we should not enforce neither, but only set a preference. In my opinion, the preference should be for a line break after the operator, unless it gives a bad visual indent (as is sometimes the case in if-statements for example).

@PGijsbers
Copy link
Collaborator Author

Then I propose the following changes w.r.t. line length:

  • I will set the max length to 100.
  • For lines longer than 100 characters but shorter than 120 characters, refactoring is encouraged but disabling them with flake with # noqa: E501 is okay too.
  • We evaluate at some point if those extra characters are generally warranted or not.

Can either of you two take a look at the failed test? I don't understand how it fails with new code but not old code. As far as I can tell, in neither scenario should BATCH_SIZE_ORIG be modified in the first place, so the lines should not affect the unit test. Moreover it seems that every time the test gets executed, different amount of (increasing) datasets are found. My theory is that someone is simply uploading datasets, and my change got rejected by pure chance? Could this be possible?

@mfeurer
Copy link
Collaborator

mfeurer commented Feb 22, 2019

I think that this PR should not fail on warnings to keep it manageable.

I think that we should prefer to have operators at the beginning of a line (following math notation).

I'm fine with 100 and 120 character setup.

And @janvanrijn agrees on your assessment of the error in #626 .

@PGijsbers
Copy link
Collaborator Author

I checked the failed tests and it seems to stem from issue #626 and flake8 checking the whole repo (and me only editing the ./openml directory). Looks good to merge to me.

@mfeurer mfeurer merged commit 45fe2a1 into develop Feb 23, 2019
@joaquinvanschoren
Copy link
Contributor

joaquinvanschoren commented Feb 25, 2019 via email

@mfeurer mfeurer deleted the fix624_pep8 branch February 25, 2019 08:05
@PGijsbers PGijsbers mentioned this pull request Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants