Blacklist for hashing by monchier · Pull Request #254 · streamlit/streamlit

monchier · 2019-10-03T18:14:53Z

Issue: #242

Description: Add a blacklist reusing code from the File Watcher. I extracted the code for the blacklist from the FileWatcher. This allows us to have a single point of configuration and some reuse. I added the logic to a class called BlackList. Also added tests, mostly moving them out of the FileWatcher tests.

Contribution License Agreement

By submiting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

monchier · 2019-10-03T19:21:02Z

lib/streamlit/hashing.py

If this is an actual limitations, we should link an issue to it. Since I am changing this code, I am removing it.

Agreed.

I don't think this is a use-case that matters for Streamlit authors anyway.

monchier · 2019-10-03T19:22:08Z

lib/tests/streamlit/watcher/LocalSourcesWatcher_test.py

I moved this to black_list_test. Let me know if I missed some coverage.

tvst · 2019-10-04T07:48:20Z

lib/streamlit/black_list.py

Use Numpy doc style

lib/streamlit/black_list.py

tvst · 2019-10-04T07:49:08Z

lib/streamlit/black_list.py

I have a PR that fixes this function. Please merge in the changes. See #265

Ok, I can just merge it in. Seems your PR is going to be merged earlier.

tvst · 2019-10-04T07:51:37Z

lib/streamlit/hashing.py

I have to think more deeply about this, but just something I noticed: this code uses cwd whereas LocalSourcesWatcher uses the folder where the main script lives.

Does it make sense to unify the two behaviors? If so, which is best: cwd or main folder?

I wonder this is just an artifact of the code being written by two different people. I do think there might be issues with both. What if the main script is in a subdirectory? And what if we execute streamlit from a path that does not include CWD? Should be document and make this a user-option. We can make clear that only a subset of files will be watched/checked for mutation.

I like this to be the same for FileWatcher and here.

tvst · 2019-10-04T07:52:53Z

lib/streamlit/black_list.py

Move this to a utility file, so it can be reused in LocalSourcesWatcher

BlackList would be shared, so we should not need to move this to a utility.

monchier · 2019-10-04T16:56:32Z

@tvst Let's chat whether we want this to be shared with FileWatcher. Let's also clarify the role of CWD/main folder path

monchier · 2019-10-04T23:19:52Z

@tvst Let's chat whether we want this to be shared with FileWatcher. Let's also clarify the role of CWD/main folder path

We decided to share this with FileWatcher and to use the main script folder.

…n script

monchier · 2019-10-07T21:37:48Z

lib/streamlit/hashing.py

+                if util.file_is_in_folder(
+                    filepath, _get_main_script_directory()
+                ) and not self._folder_black_list.is_blacklisted(filepath):


I am separately checking

whether the file is in the current directory

whether the file is not in the blacklist

monchier · 2019-10-07T21:39:52Z

lib/streamlit/watcher/LocalSourcesWatcher.py

-                    _file_is_in_folder(filepath, blacklisted_folder)
-                    for blacklisted_folder in self._folder_blacklist
-                )
-
-                if is_in_blacklisted_folder:
+                if self._folder_black_list.is_blacklisted(filepath):
                    continue

                file_is_new = filepath not in self._watched_modules
-                file_is_local = _file_is_in_folder(filepath, self._report.script_folder)
+                file_is_local = util.file_is_in_folder(
+                    filepath, self._report.script_folder
+                )


Same as hashing.py, I am leaving i) checking if the file is local and ii) checking against the blacklist separate.
Also, here we use then self._report.script_folder as the main script folder.

monchier · 2019-10-07T21:41:03Z

lib/streamlit/hashing.py

+    main_path = __main__.__file__
+    return os.path.dirname(main_path)


I am assuming this happens while I am executing a streamlit script and this is well defined.

monchier · 2019-10-07T21:41:34Z

lib/tests/streamlit/util_test.py

+        self.assertTrue(util.file_is_in_folder("/user/name/test", "/user"))
+        self.assertTrue(util.file_is_in_folder("/user/name/test", "/user/*"))
+        self.assertFalse(util.file_is_in_folder("/user/name/test", "/user/other"))


basic tests for this function.

monchier · 2019-10-08T05:22:22Z

I did not include the check for a file to be local in the blacklist. It seems to be it does not belong there semantically and would be a bit opaque, but there is some replication since the two checks are in the File Watcher and in hashing.py is a slightly different format. Also I moved file_is_in_folder to util.py and moved tests - it could have be together with the blacklist and be exposed as a public method, but does not belong to the same file and it would be odd to say "black_list.file_is_in_folder", so util seemed a better place.

lib/streamlit/folder_black_list.py

tvst · 2019-10-08T18:36:02Z

lib/streamlit/hashing.py

+        import __main__
+        import os
+
+        main_path = __main__.__file__


Does this point to the right file? In some toy tests on my computer, it seems to point to the equivalent of cli.py

Here's what I did:

# main_test.py with open('main_subtest.py') as script_file: script = script_file.read() exec(script, {})

# main_subtest.py import streamlit as st import __main__ st.write(__main__.__file__)

Expected: should see "main_subtest.py" in Streamlit app
Actual: saw "main_test.py" in Streamlit app

I tested this in streamlit... In the case below, it shows the correct path. Let me try to repro. I would rather use _report.script_path but we need to pass it down to the hasher, which I could do, but I need to change the interface.

Interesting:
https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/ScriptRunner.py#L293

We do this, before running:

sys.modules["__main__"] = module

Actually, happens here:

module.__dict__["__file__"] = self._report.script_path

Which is what we are looking for, right?
I think if we like this solution we need at the very least a test.

You're right. This code is fine, then!

Can you add a comment before line 415 explaining why this is fine?

Will add the comment.

lib/streamlit/util.py

* develop: Fullscreen button to expand tables and charts (streamlit#137) Blacklist for hashing (streamlit#254) Adding utils and tests for hello.py (streamlit#261) Fix port-finding bugs (streamlit#280) Fix hello.py usage of os.path.join on a URL (streamlit#285)

monchier requested review from domoritz, tconkling and tvst October 3, 2019 18:14

monchier added the WIP label Oct 3, 2019

monchier changed the title ~~242~~ Blacklist for hashing Oct 3, 2019

monchier commented Oct 3, 2019

View reviewed changes

monchier removed the WIP label Oct 3, 2019

domoritz removed their request for review October 3, 2019 19:24

tvst suggested changes Oct 4, 2019

View reviewed changes

monchier added the WIP label Oct 4, 2019

tvst mentioned this pull request Oct 5, 2019

streamit hello Mapping and DataFrame demos crash in Safari on macOS 10.14 #274

Closed

monchier added 10 commits October 7, 2019 10:57

Adding blacklist to hashing.py

bb26f5e

linter

e0d04a8

test

2eb2bbd

fixing tests

fc397be

removing print

e9c2d80

renaming to FolderBlackList

8a97e54

more renaming

253bc2e

wip

2a50ffa

hashing should work for scripts contained in the directory of the mai…

0e0eefb

…n script

linter

60fa652

monchier force-pushed the 242 branch from 9f11161 to 60fa652 Compare October 7, 2019 17:58

monchier added 2 commits October 7, 2019 13:58

unit test

20d2b0e

linter

48cc83f

monchier commented Oct 7, 2019

View reviewed changes

monchier added 4 commits October 7, 2019 14:51

comments

766b7c1

fixing tests

a0a34a2

linter

9a93a77

minor clean up

5e42ea4

monchier removed the WIP label Oct 8, 2019

tvst suggested changes Oct 8, 2019

View reviewed changes

monchier added 3 commits October 8, 2019 11:54

renaming and fixing comment

a061047

added a test for __main__.__file__

2e8293d

linter

3c27095

tvst approved these changes Oct 8, 2019

View reviewed changes

monchier added 3 commits October 8, 2019 12:42

Merge remote-tracking branch 'upstream/develop' into 242

0f79d92

Addec comment

a490789

linter

e4a43f3

monchier merged commit 71f1889 into streamlit:develop Oct 8, 2019

monchier deleted the 242 branch October 8, 2019 20:14

		main_path = __main__.__file__
		return os.path.dirname(main_path)

Conversation

monchier commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monchier commented Oct 4, 2019

Uh oh!

monchier commented Oct 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monchier commented Oct 8, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monchier Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

monchier commented Oct 3, 2019 •

edited

Loading

monchier Oct 8, 2019 •

edited

Loading