Skip to content

Initial Python formatting support#3349

Closed
patrick91 wants to merge 80 commits intoprettier:masterfrom
patrick91:feature/python
Closed

Initial Python formatting support#3349
patrick91 wants to merge 80 commits intoprettier:masterfrom
patrick91:feature/python

Conversation

@patrick91
Copy link
Copy Markdown
Member

@patrick91 patrick91 commented Nov 29, 2017

Hi everyone! I've started hacking around prettier to add support for Python files since I wasn't happy with autopep8 and yapf (and I'm using prettier at work).

I'm working on this in my free time so I can't say when it'll be ready, there's a lot of things to
do and to figure out, I'll update this issue while working on it :)

TODO:

  • Add comments
  • Add type annotations
  • Error when ASTExport is not installed
  • ExtSlice
  • Nested if (currently broken)
  • Nested with on python 2
  • Handle Strings (and f strings)
  • Decorators
  • ...

#3354

@vjeux
Copy link
Copy Markdown
Contributor

vjeux commented Nov 29, 2017

This is awesome! I wouldn’t be opposed to merging it in the prettier repo so that other people can help with the effort, if you are interested.

Down the road, it would be nice to compile python to js via emscriptem so that we don’t need the binary dependency and we can show it in the playground. Here’s a build of python 2.7 as an example: https://github.com/replit/empythoned

@suchipi
Copy link
Copy Markdown
Member

suchipi commented Nov 29, 2017

If there's a python parser written in JS that we can leverage, that'd be our first choice (will work in the browser and etc), but if that's not available, we might want to find a way to vendor ASTExport with prettier. We vendor all of our other parsers so that we don't have to worry about the AST format differing when the user has a different parser version installed.

@patrick91
Copy link
Copy Markdown
Member Author

@vjeux oh, I'll have a look at empython! thanks.
I'm interested in merging this into the prettier repo, but maybe it is a bit early? :)

@suchipi vendorizing ASTExport shouldn't be a problem, it is a simple script :)
I'm not entirely opposed on using a JS based parser, but I wasn't able to find one that's up to date
and also it means that we'd need to maintain it as well to keep up to date with newer python version (for example python 3.7 is introducing f-strings)

@duailibe
Copy link
Copy Markdown
Collaborator

I'm interested in merging this into the prettier repo, but maybe it is a bit early? :)

The HTML support in the repo is super experimental as well :-)

@suchipi
Copy link
Copy Markdown
Member

suchipi commented Nov 29, 2017

We had a couple of issues where people used the experimental HTML support and were surprised it didn't work, though. We'd want to be careful to ensure that doesn't happen again.

@duailibe
Copy link
Copy Markdown
Collaborator

Yeah, you're right. I thought that was fixed with getSupportInfo()

@ikatyang
Copy link
Copy Markdown
Member

Experimental parser should not be added in cli-constant.js so as to avoid exposing, and it should be added in support.js with since: undefined just like html:

prettier/src/support.js

Lines 176 to 189 in fbbfa52

{
name: "HTML",
since: undefined, // unreleased
parsers: ["parse5"],
group: "HTML",
tmScope: "text.html.basic",
aceMode: "html",
codemirrorMode: "htmlmixed",
codemirrorMimeType: "text/html",
aliases: ["xhtml"],
extensions: [".html", ".htm", ".html.hl", ".inc", ".st", ".xht", ".xhtml"],
linguistLanguageId: 146,
vscodeLanguageIds: ["html"]
}

@patrick91
Copy link
Copy Markdown
Member Author

If that's fine with all of you ok :) but I'd wait to have a decision on the parser as well, just to prevent having to change some pieces related to the current AST structure!

@jquense
Copy link
Copy Markdown

jquense commented Nov 29, 2017

can I just add that this would change my life for the much better 👍 AWESOME

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

Very neat.

People have run Python on the client with emscripten. Otherwise, the Python binary can be optional, no?

Note that the standard AST parser in Python is part of the core stdlib: https://docs.python.org/3/library/ast.html, so there should be a strong presumption in favor of using that instead of a separate parser.

@vjeux
Copy link
Copy Markdown
Contributor

vjeux commented Nov 29, 2017

Python provides a way to query the ast as part of the language itself, so there’s very little incentive for third party implementations to build competing ones. Building a faithful parser is not an easy task and it is a critical requirement if we want to reprint the code without introducing different semantics. So the only realistic choice here is to use the one provided by default, like you did.

It’s fine to require a binary to be installed on the machine for now, it’ll take a while for the whole language to be supported anyway. If we can get it to run using emscriptem, then that’s perfect. Otherwise we’ll have to invest in getting third party dependencies like this one installed and have a server for the playground. We’ve been lucky not to have had to do it so far, but it’s not the end of the world.

As for merging it in, if you don’t feel like it yet, that’s ok. The most important thing right now is that you churn code as fast as possible to print as much as the language as possible. Then there’s going to be a phase to make it actually correct and one to make it pretty. There’s basically a clock where you won’t be motivated anymore / won’t be able to spend the time anymore, so we should do whatever you feel best to get the maximum amount of work done :)

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

That said, it might also be worth taking a look at Baron and RedBaron. They're Python CSTs (instead of ASTs), but it may be unnecessary to fully duplicate the Python code printing logic, especially as the language adds new features.

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

You may need to vendor in something like astexport, though – the release on PyPI only supports Python 3 (despite having a Python 2 wheel for some reason).

Like I imagine this will work just fine with whatever virtualenv you're in, and will use that Python binary, but not supporting Python 2 might not be ideal.

@patrick91
Copy link
Copy Markdown
Member Author

@vjeux totally agree with you! I think it might be ready to merge when it prints most of the things, right now it is missing a lot! :)

@taion I'll definitely try to support python 2 as well!

parts.push(concat(["**", path.call(print, "kwarg")]));
}

if (n.kwonlyargs.length > 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwonlyargs need to go before **kwargs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taion why? This seems to work just fine :)

screen shot 2017-11-29 at 21 17 45

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, got it! thanks!

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

Looking at this some more, this might be really tricky without using something like Baron. The Python AST is a pure AST, as with ESTree. Unlike Babylon, it doesn't include information about comments at all.

@vjeux
Copy link
Copy Markdown
Contributor

vjeux commented Nov 29, 2017

@taion comments are handled in a different way with prettier. We take the raw list of comments and positions and attach them to nodes ourself. All we need to do is to implement a small lexer to extract out comments. This is super easy to do, even if python tooling doesn't provide it for us.

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

Ah, cool. Looks like tokenize gives you that, then. Though docstrings are sort of their own thing and do end up in the AST (I think?), which is sort of weird.

@azz
Copy link
Copy Markdown
Member

azz commented Nov 29, 2017

This is really cool, but I'm going to play devil's advocate and list some of my concerns:

  1. Python doesn't have a specification. This has bitten us a bunch of times supporting Less and SCSS, and Python is probably a more complex language.
  2. This will be the first language that doesn't fall in to the "languages used in front-end web" bucket. Not necessarily an issue, but it expands the scope of Prettier, and maybe people will expect we add other languages like Ruby, C#, etc.
  3. astexport: no commits in 2 years.
  4. Having a binary dependency that the user has to install is annoying for the user, and really difficult for us to work with. Currently we bundle in very specific versions of parsers, even an earlier patch release will often cause our tests to fail. It also means we can't use it in the playground, which makes triaging issues really hard.
  5. It's been mentioned already but we've had issues with people running the HTML formatter despite it being experimental - we should find a way to outright disable the code in production releases.
  6. Python has some complexities in when and where it is whitespace sensitive. We need to be really careful with the Prettier doc primitives so we don't break where it would be a syntax error.
  7. People will naturally ask for Python 2 support. Will this be possible?
  8. In terms of maintenance, I'm not sure how many of the collaborators are proficient in Python? We'd probably need to pick up a couple of collaborators to support it.

All that being said, I'm fine with merging this in an experimental status, provided we address (5.)

As @vjeux mentioned, being able to compile the parser down to JavaScript would be a huge step towards getting this released.

@azz azz mentioned this pull request Nov 29, 2017
5 tasks
@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

@azz

What do you mean by "Python doesn't have a specification"?

The Python 2/Python 3 support thing is trivial. astexport basically does very little, and it's a few little syntax things that make it not support Python 2. There's not much necessary to get it to work with Python 2, and otherwise it's just using ast from the stdlib, which is doing all the hard work. Literally all astexport is doing is traversing the output of ast.parse and exporting it as JSON, and it'd be quite easy to write one internal to this repo and to drop that binary dependency.

The AST parser here is literally built in to Python and exposed as such, so it's not an extra binary dependency unless you think people are going to try to format Python without having Python installed somehow.

In some sense you actually can't neatly bundle it here, though, because the AST parser is somewhat specific to individual versions of Python, and as such the correct version of Python to use would be the system (or virtualenv Python).

Though I think the biggest question here is – "what's the cost of bridging over the Python AST to JS, versus porting over Prettier's printer to Python?"

Is most of the value-add here a set of language-agnostic heuristics that are complicated and subtle, or is there a lot to handle JS-specific idioms? If it's the latter, then perhaps there's not much presumptive reason to implement Python tooling in Node.

@patrick91
Copy link
Copy Markdown
Member Author

@azz thanks for the comments!

  1. not sure by what you mean by this. If you're talking about formatting, the PEP8 document should be a good reference
  2. that'd be expected yeah, but I really want this built into prettier! (and looks like I'm not the only one)
  3. ASTExport is a trivial script, it can be vendorized if needed, I can send an email to the maintainer.
  4. I don't see this as a real issue, since you'll be likely having python on your machine if you're writing code. Anyway, for the playground we might check if this works: https://github.com/pybee/batavia
  5. +1 one for this, I don't want to let people think that formatting python works, there's a lot of work to do still :)
  6. Yup. Do you think we can add some tests for this as well? Like validating all the test codes after formatting them?
  7. Yes!
  8. I'm willing to help, I'm sure we can find some more people that will contribute!

@azz
Copy link
Copy Markdown
Member

azz commented Nov 29, 2017

What do you mean by "Python doesn't have a specification"?

JavaScript has a spec, which informs us of the rules in writing valid JavaScript, as well as the runtime semantics of the code. I forgot that Python has a grammar reference, which helps with this.

In some sense you actually can't neatly bundle it here, though, because the AST parser is somewhat specific to individual versions of Python [...]

This is the case with JavaScript, too, but we make sure we can handle all JavaScript language features (including a lot of experimental features). This way the output of Prettier doesn't vary based on which version of Node.js, or which browser you are using.

Is most of the value-add here a set of language-agnostic heuristics that are complicated and subtle, or is there a lot to handle JS-specific idioms?

The doc-building/printing portion of Prettier is mostly independent of any programming language. There have been tweaks to support certain features, like quote blocks in Markdown, and template strings in JavaScript, but those might be useful for printing Python too (e.g. multi-line strings)

@azz
Copy link
Copy Markdown
Member

azz commented Nov 29, 2017

I've met a guy who did a talk on batavia. It's a cool project. I might have a play with it and try and get ast working.

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

@azz

Python AST parsing is version-specific though. It's not like how all syntactically-valid ES5 code is also syntactically-valid "modern JS". If you run ast.parse (really just compile) from Python 3 against Python 2 code, it may will barf on syntax errors.

In general, the Python backward compat policy doesn't say "no backward-incompatible syntax changes": https://www.python.org/dev/peps/pep-0387/. If you want to use ast, you need to use the ast package corresponding to the version of Python for which a project runs.

@taion
Copy link
Copy Markdown
Contributor

taion commented Nov 29, 2017

The trade-off ends up being language-specific printing stuff. The astor docs for example specifically flag that many attempts to print the Python AST to source have failed on corner cases: https://github.com/berkerpeksag/astor#id2.

@dhirschfeld
Copy link
Copy Markdown

If it lowers the dev/support burden, maybe just support Python3? Many projects will be removing support for Python2 at the end of next year anyway.

@azz
Copy link
Copy Markdown
Member

azz commented Nov 29, 2017

Are there syntax breaking changes from Python 3.0 to 3.7?

@patrick91
Copy link
Copy Markdown
Member Author

@azz none that I can't think of, I previously said that f-string will be supported in py 3.7, but they are already in (since 3.6)

@patrick91
Copy link
Copy Markdown
Member Author

@azz can you do another review? :)

@patrick91
Copy link
Copy Markdown
Member Author

I've started another repo with (one for now) some unit tests that I would use to verify to validity of the formatted files, I had a bug where it was printing correct python code but it was doing a different thing.

https://github.com/patrick91/prettier-python-tests

I still have to add the steps to download prettier, prettify the code and then run the tests :)

@azz
Copy link
Copy Markdown
Member

azz commented Dec 21, 2017

I'll do another soon, but I'm considering the advice from @jlongster in #3503 (comment) and thinking about how we can share Prettier as a tool, while having separate maintainable codebases. To that end I'm working on a plugin API in #3536, and once that's done we can move this PR to a new repository (prettier/prettier-python) with a dependency on prettier.

This way we can achieve:

  • Independent versioning, providing the flexibility to release fixes at a faster rate than Prettier.
  • Seperate CI for faster iteration and ability to add a larger range of tests, potentially using more Python versions.
  • Ability to watch only Python issues on GItHub

All the while keeping:

  • The core Prettier platform, without having to install separate tools to format Python and the languages that Prettier currently supports.
  • Editor integration support

We can do the same for the PHP PR in #3521, and potentially the HTML support in #3534.

Thoughts?

@patrick91
Copy link
Copy Markdown
Member Author

Sounds good to me, especially for this:

Ability to watch only Python issues on GItHub

I can see it being a problem when having many languages baked into prettier.

Let's see what the other folks think :)

@jlongster
Copy link
Copy Markdown
Member

Yes @azz thank you for working on that! I'd like to start watching prettier's issues again but I can't imagine trying to lump a whole new language into here and the amount of issues that it would add on top of everything. I really don't think another language should be included here. Markdown is passable because it's a big part of the JS ecosystem, and it's probably easy enough. But we should keep prettier as a JS tool.

@taion
Copy link
Copy Markdown
Contributor

taion commented Dec 21, 2017

I'm wondering about that community aspect, though. Like, as set up, the only people who can actually contribute to this are people who are familiar with Python AST manipulation and who can work in Node.

So I'd ask then how this matches @vjeux's points in #3349 (comment).

@vjeux
Copy link
Copy Markdown
Contributor

vjeux commented Dec 21, 2017

What I want is for people to stop wasting time formatting their code. The reality is that people use many different programming languages on a daily basis and not just JS. It turns out that prettier is a really good foundation for building a pretty printer in any language so I feel like it would be a shame for it not to expand to more places now that it's covering the core use case very well.

I also found out that a big reason that languages build silos around themselves is that all the tooling is completely different. One of the reason I joined the Nuclide team was to break silos by not having to learn a completely new IDE when from switching languages, which is one less ramp up time. I believe that having the same pretty printer which shares similar decisions and design philosophy is going to be one more stone in that direction.

Now, this is --not-- going to be easy as forces to push language ecosystems away from each other are strong, but I don't think it's impossible and I believe that having PHP and Python pretty printers as part of prettier is something that can work. Having a separate repo to manage issues and PR is a good idea.

@taion
Copy link
Copy Markdown
Contributor

taion commented Dec 21, 2017

@vjeux

I definitely agree with what you're saying.

My concern is maybe more around the API boundaries here. If the idea per #3536 is a plugin API, then, in that world where there is a level of separation, it would be great to implement the AST token printing in Python rather than in JS. That way Prettier could just leverage e.g. astor for Python, &c.

The core printing API, the platform API, the CLI stuff – that's all great. To me, especially if prettier-python lives in a different package entirely, it just looks weird that printer-python.js is JS instead of Python, especially when it's essentially just writing out a serializable data structure. Writing Python tooling in JS is a wall, too.

BTW, what's that about IDEs? Maybe I'm spoiled, but I use JetBrains IDEs for just about everything... they seem to support pretty much everything under the sun.

@j-f1
Copy link
Copy Markdown
Member

j-f1 commented Dec 21, 2017

@taion Interesting idea. One of the important components of Prettier is the actual logic that figures out where to break lines. Maybe the Python component could generate the document using translated builders, then serialize the result to JSON and had that over to Prettier’s JS so that it can print the prettified code.

@taion
Copy link
Copy Markdown
Contributor

taion commented Dec 21, 2017

@j-f1 Yeah – I mentioned this upthread, but it's not a super-compelling thing for part of Prettier proper. As a Prettier plugin that's in a separate repo aimed at Python people, it's a better idea IMO. It looks like doc-builders just produces a serializable structure, so I'd imagine that would be a better API for plugins aimed at other languages.

@azz
Copy link
Copy Markdown
Member

azz commented Dec 24, 2017

I've copied all the commits from this PR to a new repository 🎉 https://github.com/prettier/prettier-python. @patrick91 I've made you an admin of the new repo.

@azz azz closed this Dec 24, 2017
@taion taion mentioned this pull request Jan 1, 2018
@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 18, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.

Projects

None yet

Development

Successfully merging this pull request may close these issues.