Skip to content

Conversation

@yashshah1
Copy link
Contributor

Before contributing (PLEASE READ!)

⚠️ If your contribution is more than a few lines of code, then prior to starting to code on it please post in the issue saying you want to volunteer, and then wait for a positive response. And if there is no issue for it yet, create it first.

This helps make sure (1) two people aren't working on the same thing, (2) this is something Streamlit's maintainers believe should be implemented/fixed, (3) any API, UI, or deeper architectural changes that need to be implemented have been fully thought through by Streamlit's maintainers, and (4) your time is well spent!

More information in our wiki: https://github.com/streamlit/streamlit/wiki/Contributing


Issue: Please include a link to the issue you're addressing. If no issue exists, create one first and then link it here.
Issue: #1957
Description: Describe the changes you made to the code, so it's easier for the reader to navigate your pull request. Usually this is a bullet list.
As discussed:

  • Removed any trailing/leading spaces
  • Removed all newlines from the svg string
  • Passed the svg string to quote from urllib.parse (with safe='')

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@yashshah1 yashshah1 requested a review from a team September 16, 2020 08:03
@yashshah1
Copy link
Contributor Author

yashshah1 commented Sep 16, 2020

I'd forgotten to update the tests for image_to_url. I've pushed the new cases.

Copy link
Contributor

@karriebear karriebear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for taking this on!

Copy link
Contributor

@karriebear karriebear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry I didn't take a close look earlier but it looks like there's actually some interplay between the frontend so this change doesn't quite work.

The url that gets returned by image_proto works perfectly but we're then running SVG strings through another sanitization with sanitizeSvg from lib/UriUtils.ts. It does another set of encoding + DomPurify.sanitize (to prevent JS from executing) which encodes the % from our previously encoded string.

It would be great if we can consolidate these two so that all the encoding happens in one place while still sanitizing to protect us from any JS included in the svg string.

@yashshah1
Copy link
Contributor Author

@karriebear Wow, I'd completely missed that.

So, I did a little checking and realised that encodeURIComponent(...) essentially does the exact same thing as urllib.parse.quote(...). So this problem could be solved by anyone of two ways

  • Removing quote from the python script, or
  • Removing encodeURIComponent

First Method (removing quote):

It would probably be better to keep the encoding as close to the end-user as possible as encoding does increase the number of bytes a message has (although not by a lot), and doing the encoding on the users' device could save (a little) in the quantum of data transferred.

Second Method (removing encodeURIComponent):

However, processing each SVG string on the clients' device adds to the overall work each clients' device has to do and could lead to an increase in load time especially on lower-end devices. In such a case, a tradeoff between network speed could take some load off the client considering servers traditionally are more powerful devices.

I can't decide which one way seems to be more beneficial to streamlit, and would like your input on this.

@karriebear
Copy link
Contributor

I think most devices at this stage should be able to handle encodeURIComponent without difficulties and I imagine the differences between an encoded and decoded string should not be a significant difference.

However, it looks like the resulting data url of the encodedUriComponent should work when copied into a new browser window but does not. Not sure if it is some timing thing but if it is something that works with quote, that would be an argument to go with that.

@yashshah1
Copy link
Contributor Author

yashshah1 commented Sep 19, 2020

Hi, makes sense about not being enough computational load for it to matter.

However, I am unable to reproduce a case where encodeURIComponent fails. I still need to get rid of the new lines, but running a svg string without newlines through encodeURIComponent and then prepending the svg prefix data:image/svg+xml, makes the svg render perfectly in a new browser window.

I'll try and raise a PR by pushing everything to Python

@yashshah1
Copy link
Contributor Author

@karriebear I've pushed the changes such that all the processing is done by quote and the frontend merely displays it. quote takes care of any funny DOM strings that it might encounter.

@yashshah1
Copy link
Contributor Author

These tests are failing since DOMPurify is imported but never used, and this was the only file where it was being imported, can I remove it as a dependency?

@karriebear
Copy link
Contributor

karriebear commented Sep 21, 2020

I actually had the same issue as you when using encodeURIComponent. I just tried out the latest changes and looks like I still get the same result. I noticed that it works when the dataUri is viewed in the browser but when it is in the image tag, it's problematic. Turns out this is because an image tag must be one image file where our example is importing image from elsewhere (source).

I didn't find a quick solution to this so it might be something we bring back to our product team to see if this is something we should handle. If you know of a solution for this, that would be great and we can try that out. Otherwise, feel free to put this on hold - sorry to have wasted your time on this ☹️

@yashshah1
Copy link
Contributor Author

Like the SO answer suggests, one quick fix would be to fetch all linked images and encode them as base64 strings; after which we can encode the whole svg and send it to the front end.

If this is something that is acceptable enough of a fix, I could try it out. One caveat being, this would make large SVGs with a lot of links quite slow to load, but then again, a SVG with links to a lot of images beats the point of a SVG

@karriebear
Copy link
Contributor

An alternative would be to use a SVG tag instead of a dataURI. Though like you said, a SVG with links to other images does defeat the point of a SVG. I'll review with the team to see what we think about this!

@yashshah1
Copy link
Contributor Author

Sure, let me know if there is any work to be done 😁

@akrolsmir
Copy link
Contributor

These tests are failing since DOMPurify is imported but never used, and this was the only file where it was being imported, can I remove it as a dependency?

Flyby comment -- We added in DOMPurify as a security measure, to prevent the execution of Javascript inside of the SVG; I believe that just using quote would now permit JS to execute.

I could see an argument that we do indeed want to permit JS execution in SVGs (something like: the author of the Streamlit app already has to be trusted), but wanted to make sure this was a deliberate choice.

@karriebear
Copy link
Contributor

We just discussed and while this could be an odd use case, this is something we would love to support!

Our ideal solution would be to create a SVG tag instead of the fetching/encoding methodology (seems too much of an overkill for us to render an image). However, with SVG, security is a big concern for us so we would definitely want to bring back DOMPurify (or any python alternative). We would probably also want to modify our Image.proto since it will no longer be an url.

If this is still something you would like to take on, let me know and happy to provide additional guidance if needed! And if there's a better approach, happy to hear that as well.

@yashshah1
Copy link
Contributor Author

yashshah1 commented Sep 22, 2020

Sure I would love to work on this. So, if I understand correctly, what needs to be done is an added support for st.svg(...), which could be called either directory or when an svg is passed to st.image() (How st.write works for multiple datatypes) . With this, get back DOMPurify for an added security measure as mentioned above.

Since I am still new to the codebase, are there any specific code file(s) that I can start reading to get a better idea of how this is done with other tags?

@karriebear
Copy link
Contributor

karriebear commented Sep 23, 2020

At this stage, I don't know if we need a st.svg API. I think it'd be better to just extend st.image to render svg strings differently. However, we could (and probably should) create a new frontend component.


Here's some context to help get started:

Generally, most of the magic happens in Block.tsx but since st.image can take in a list of images, I recommend focusing in on ImageList.tsx. This way, folks can mix and match what type of data they provide into st.image.

In terms of the data being passed into ImageList.tsx, these are proto shapes that are defined in corresponding proto files. In this case it would be Image.proto. When updating proto files, make sure to run make protobuf in order for your changes to propogate to the frontend. (The frontend only looks at autogen/protobuf which gets generated/updated through the make protobuf command. It won't know of any changes to proto files).

I don't think we have another component that does quite this logic so not sure if this would help. The closest approximation I can think of is our upcoming expandable. Here we have a conditional to determine if we should show a regular block or an expandable block.

Let me know if this helps or if there's any other context we can provide to point you in the right direction.

@karriebear
Copy link
Contributor

@yashshah1 just wanted to check in to see if this is something you would still like to work on? If not, let me know and we'll close the PR and update the issue with our decisions made here. Either way, thanks for your patience as we worked through this!

@yashshah1
Copy link
Contributor Author

Hi @karriebear, This is definitely something I want to see through, I've just been caught up a little at work. I should have some decent progress in a day or two. Thank you for your patience!

@yashshah1
Copy link
Contributor Author

Hi @karriebear, first off thank you for your patience :)

I wanted to run an approach by you, instead of modifying multiple components, we could integrate svg images by just changing ImageList.tsx

In the return part of it, where it maps through element, we check if the incoming string is a svg string or not, if it is, we run decodeURIComponent and return it so:

const svgString = decodeURIComponent(src.substring(SVG_PREFIX.length))
return (
  <div
    className="image-container stImage"
    key={idx}
    style={{ width: containerWidth }}
    dangerouslySetInnerHTML={{ __html: svgString }}
  ></div>
)

This shouldn't be a problem as far as I can tell since this is already run through buildMediaUri which calls DOMPurify.sanitize which takes care of removing unwanted JS.

What this solution lacks

Currently, there is also a caption that is added to the bottom of the image, if this solution is accepted, we might have to convert the DOM of the caption part to be a string, add add it above

Full code of ImageList.tsx:

const SVG_PREFIX = "data:image/svg+xml,"
...
return (
  <div style={{ width }}>
    {element
      .get("imgs")
      .map((img: ImmutableMap<string, any>, idx: string) => {
        const src = buildMediaUri(img.get("url"))
        if (src.startsWith(SVG_PREFIX)) {
          const svgString = decodeURIComponent(
            src.substring(SVG_PREFIX.length)
          )
          return (
            <div
              className="image-container stImage"
              key={idx}
              style={{ width: containerWidth }}
              dangerouslySetInnerHTML={{ __html: svgString }}
            ></div>
          )
        } else {
		  // unchanged
        }
      })}
  </div>
)

@yashshah1
Copy link
Contributor Author

Just an additional thought, if it's okay, we could add react-html-parser as a dependency, this removes the need of setting any prop that has dangerous in its name :P

@karriebear
Copy link
Contributor

karriebear commented Oct 19, 2020

No problems with adding react-html-parser and adding a condition to the ImageList is definitely the right approach!

Though, instead of storing the SVG data in image.get("url"), it'd be preferred if you could add another property to Image.proto for markup or svg (or whatever better name you can come up!). Then you can just store the value of the SVG in that new property in image_proto.py instead of adding it to the url.

This would be nice instead of repurposing img.get("url"), especially since we won't be storing the SVG as an URL. Then that'll let you check for the property (like below) instead of what the string begins with. We'll also be able to save on encoding + decoding. Though we should still run it through DOMPurify.

if (image.get('markup/svg/TBD property name')) {
   ...
} else {
   ...
}

Thanks again though for thinking through alternatives! That's always valuable. If it's confusing or you're busy, no worries, we can take it too. Also, didn't mean to rush you. Work and life definitely comes first and we appreciate any time you spare helping us, no matter how big or small!

@yashshah1 yashshah1 closed this Oct 20, 2020
@yashshah1 yashshah1 force-pushed the 1957-svg-string-fix branch from 5f9fedb to 67d94f8 Compare October 20, 2020 17:10
@yashshah1
Copy link
Contributor Author

yashshah1 commented Oct 20, 2020

Got closed since I overwrote my previous work with a force push after a merge from develop. Will push my changes and reopen

@yashshah1 yashshah1 reopened this Oct 20, 2020
@yashshah1 yashshah1 marked this pull request as draft October 20, 2020 19:29
@yashshah1 yashshah1 force-pushed the 1957-svg-string-fix branch from ac5bf43 to 35f6ecf Compare October 20, 2020 19:30
@yashshah1
Copy link
Contributor Author

yashshah1 commented Oct 20, 2020

@karriebear I have done a few changes, which should work in theory, but I am thrown an error that I can't seem to reason out, maybe you can help me out here.

This is the debugging output I am given:
image

I've also taken the liberty of converting this PR to a draft.

@karriebear
Copy link
Contributor

It looks like something is not happy with your compiled protobuf. The object is coming from a file that gets autogenerated when you set up your development environment and needs to be updated when you update a proto file.

Did you run make protobuf or one of the develop commands (i.e. make mini-devel or make all-devel)? I pulled down your branch and ran make mini-devel which worked for me.

@yashshah1
Copy link
Contributor Author

@karriebear Great, that did work for me, thank you!
The bug seems to be solved by the last commit that I'd made except for one caveat. In the svg of your GH avatar:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100" height="100">
    <clipPath id="clipCircle">
        <circle r="25" cx="25" cy="25"/>
    </clipPath>
    <image xlink:href="https://avatars.githubusercontent.com/karriebear" width="50" height="50" clip-path="url(#clipCircle)"/>
</svg>

Using the above svg brings up a blank image on the frontend, which is fixed by changing the property name of xlink:href to just href, after which the bear appears!

While trying to reason out why this happens (given that pasting an encoded URI in the browser search bar works perfectly), I came across this, which mentions that the xlink:href is now deprecated and we should switch to using href instead. If that is acceptable I am now open to code reviews on this PR.

Else, can you try and point me in a direction, because I can't seem to reason out why my browser doesn't show the bear (Given that chrome does still support the tag). One hotfix could be to replace all xlink:href to href in image_proto.py

@karriebear
Copy link
Contributor

Given that it's deprecated, that's more than enough reason for us to not support it. It's recommended to not use anymore so we should support that messaging :) I've updated the test case in the issue to reflect this. Thanks for taking a look into this!

If you could also add some tests, that'll be great and then happy to review!

@AnOctopus
Copy link
Contributor

@yashshah1 Hey, since you haven't worked on this in a few months, I'm going to finish it out so we can get it merged in.

@AnOctopus AnOctopus mentioned this pull request Jan 27, 2021
@karriebear
Copy link
Contributor

Closing now that #2666 is merged. Thanks for the help getting us started on this @yashshah1!

@karriebear karriebear closed this Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some svg strings that work on the browser do not work with st.image

4 participants