Skip to content

Blacklist for hashing#254

Merged
monchier merged 22 commits intostreamlit:developfrom
monchier:242
Oct 8, 2019
Merged

Blacklist for hashing#254
monchier merged 22 commits intostreamlit:developfrom
monchier:242

Conversation

@monchier
Copy link
Copy Markdown
Contributor

@monchier monchier commented Oct 3, 2019

Issue: #242

Description: Add a blacklist reusing code from the File Watcher. I extracted the code for the blacklist from the FileWatcher. This allows us to have a single point of configuration and some reuse. I added the logic to a class called BlackList. Also added tests, mostly moving them out of the FileWatcher tests.


Contribution License Agreement

By submiting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@monchier monchier added the WIP label Oct 3, 2019
@monchier monchier changed the title 242 Blacklist for hashing Oct 3, 2019
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an actual limitations, we should link an issue to it. Since I am changing this code, I am removing it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I don't think this is a use-case that matters for Streamlit authors anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to black_list_test. Let me know if I missed some coverage.

@monchier monchier removed the WIP label Oct 3, 2019
@domoritz domoritz removed their request for review October 3, 2019 19:24
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Numpy doc style

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR that fixes this function. Please merge in the changes. See #265

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can just merge it in. Seems your PR is going to be merged earlier.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think more deeply about this, but just something I noticed: this code uses cwd whereas LocalSourcesWatcher uses the folder where the main script lives.

Does it make sense to unify the two behaviors? If so, which is best: cwd or main folder?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder this is just an artifact of the code being written by two different people. I do think there might be issues with both. What if the main script is in a subdirectory? And what if we execute streamlit from a path that does not include CWD? Should be document and make this a user-option. We can make clear that only a subset of files will be watched/checked for mutation.

I like this to be the same for FileWatcher and here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a utility file, so it can be reused in LocalSourcesWatcher

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlackList would be shared, so we should not need to move this to a utility.

@monchier
Copy link
Copy Markdown
Contributor Author

monchier commented Oct 4, 2019

@tvst Let's chat whether we want this to be shared with FileWatcher. Let's also clarify the role of CWD/main folder path

@monchier monchier added the WIP label Oct 4, 2019
@monchier
Copy link
Copy Markdown
Contributor Author

monchier commented Oct 4, 2019

@tvst Let's chat whether we want this to be shared with FileWatcher. Let's also clarify the role of CWD/main folder path

We decided to share this with FileWatcher and to use the main script folder.

Comment on lines +320 to +322
if util.file_is_in_folder(
filepath, _get_main_script_directory()
) and not self._folder_black_list.is_blacklisted(filepath):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am separately checking

  • whether the file is in the current directory
  • whether the file is not in the blacklist

Comment on lines -162 to +168
_file_is_in_folder(filepath, blacklisted_folder)
for blacklisted_folder in self._folder_blacklist
)

if is_in_blacklisted_folder:
if self._folder_black_list.is_blacklisted(filepath):
continue

file_is_new = filepath not in self._watched_modules
file_is_local = _file_is_in_folder(filepath, self._report.script_folder)
file_is_local = util.file_is_in_folder(
filepath, self._report.script_folder
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as hashing.py, I am leaving i) checking if the file is local and ii) checking against the blacklist separate.
Also, here we use then self._report.script_folder as the main script folder.

Comment on lines +173 to +174
main_path = __main__.__file__
return os.path.dirname(main_path)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming this happens while I am executing a streamlit script and this is well defined.

Comment on lines +184 to +186
self.assertTrue(util.file_is_in_folder("/user/name/test", "/user"))
self.assertTrue(util.file_is_in_folder("/user/name/test", "/user/*"))
self.assertFalse(util.file_is_in_folder("/user/name/test", "/user/other"))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic tests for this function.

@monchier monchier removed the WIP label Oct 8, 2019
@monchier
Copy link
Copy Markdown
Contributor Author

monchier commented Oct 8, 2019

I did not include the check for a file to be local in the blacklist. It seems to be it does not belong there semantically and would be a bit opaque, but there is some replication since the two checks are in the File Watcher and in hashing.py is a slightly different format. Also I moved file_is_in_folder to util.py and moved tests - it could have be together with the blacklist and be exposed as a public method, but does not belong to the same file and it would be odd to say "black_list.file_is_in_folder", so util seemed a better place.

import __main__
import os

main_path = __main__.__file__
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this point to the right file? In some toy tests on my computer, it seems to point to the equivalent of cli.py

Here's what I did:

# main_test.py

with open('main_subtest.py') as script_file:
    script = script_file.read()

exec(script, {})
# main_subtest.py

import streamlit as st
import __main__
st.write(__main__.__file__)

Expected: should see "main_subtest.py" in Streamlit app
Actual: saw "main_test.py" in Streamlit app

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this in streamlit... In the case below, it shows the correct path. Let me try to repro. I would rather use _report.script_path but we need to pass it down to the hasher, which I could do, but I need to change the interface.

Screen Shot 2019-10-08 at 11 42 32 AM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@monchier monchier Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, happens here:

module.__dict__["__file__"] = self._report.script_path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is what we are looking for, right?
I think if we like this solution we need at the very least a test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This code is fine, then!

Can you add a comment before line 415 explaining why this is fine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the comment.

@monchier monchier merged commit 71f1889 into streamlit:develop Oct 8, 2019
@monchier monchier deleted the 242 branch October 8, 2019 20:14
tconkling added a commit to tconkling/streamlit that referenced this pull request Oct 9, 2019
* develop:
  Fullscreen button to expand tables and charts (streamlit#137)
  Blacklist for hashing (streamlit#254)
  Adding utils and tests for hello.py (streamlit#261)
  Fix port-finding bugs (streamlit#280)
  Fix hello.py usage of os.path.join on a URL (streamlit#285)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants