Skip to content

ForwardMsg de-duping#28

Merged
tconkling merged 58 commits intostreamlit:feature/hashingfrom
tconkling:tim/ForwardMsgDeDuping
Aug 28, 2019
Merged

ForwardMsg de-duping#28
tconkling merged 58 commits intostreamlit:feature/hashingfrom
tconkling:tim/ForwardMsgDeDuping

Conversation

@tconkling
Copy link
Copy Markdown
Contributor

@tconkling tconkling commented Aug 27, 2019

This is the first part of ForwardMsg caching:

Server

  • ForwardMsg.hash is calculated on every outgoing ForwardMsg
  • The server caches all messages whose size is >= 10k in its MessageCache instance. (These cached message are never evicted - this will come in a future PR.) Each entry in the cache contains the cached message, and also a (weak-referenced) set of all ReportSessions for which that message has already been delivered.
  • Cached ForwardMsgs have the metadata.cacheable flag set. When the client receives a ForwardMsg with cacheable == true, it also caches it.
  • If the server believes that a client has cached a given ForwardMsg (because the cache has an entry for that message+session), it instead sends a "reference" message that points to the original message via ForwardMsg.ref_hash.
  • The server also exposes a new HTTP endpoint, /message, from which clients can request messages that they have received a ref_hash message for, but are missing in that client's local cache. (This should not happen frequently! If clients are getting a bunch of local cache misses, then the server is not maintaining its cache properly.)

Client

  • WebsocketConnection has a new ForwardMsgCache member
  • All incoming ForwardMsgs are passed through this cache (via ForwardMsgCache.processMessage):
    • If the ForwardMsg is not a reference message, it is cached locally and nothing more is done
    • If the ForwardMsg is a reference message, the client checks to see if it has the original message in its cache; if so, it gets returned
    • If the reference message is missing from the client's cache, it is requested from the server's /message endpoint, and then cached.

TODO:

  • The client and server both need to perform cache eviction; currently, their caches never have anything removed from them.
  • I'd like to have some better client tests that use a mock server to test its handling of the /message endpoint

It doesn't make sense to try construct a new Server instance here,
especially when we're not passing any of its required args.
* tim/ServerRoutingRefactor:
  imports
  DebugHandlerTest stub
  MetricsHandlerTest
  HealthHandler & test
  Remove server param from MetricsHandler
  streamlit.server package, containing Server and routes
  Server.get_current: raise if the server doesn't exist
* tim/ServerRoutingRefactor:
  server_util functions
* develop:
  Server route tests (streamlit#1220)
  Improving image processing times (#1247)
  Support hashing `functools.partial` (streamlit#1242)
  Graphviz chart cypress tests (streamlit#1215)
  Fix bug where sliders were always returning a tuple (streamlit#1243)
  st.image() format parameter (streamlit#1239)
  Cypress tests for empty charts (streamlit#1232)
  Ignore VSCode configs (streamlit#1241)
  Add font smoothing antialiased to html body in cypress tests (streamlit#1233)
  Write instance of list as json (#1228)
  streamlit#1159 add parameter to specify "BGR" color channel in st.image (streamlit#1222)
  streamlit#1178 Tweaking st.slider errors (streamlit#1208)
  Send the user's Python version to Segment (streamlit#1212)
  Clean up our protos (streamlit#1211)
  Fix a single word in our docs and lint some files ;) (streamlit#1229)
  Fix DPR issues with Cypress snapshots (streamlit#1213)
This means that tests.streamlit is a proper package (so modules can
import from each other). It also means that several tests that
check qualified module names needed to be updated to reflect their
new paths.
* tim/TestableServer:
  Argh. Fix Python 2 testing
  Add missing __init__.py to tests.streamlit
  ServerTestCase base class
  Server_test.test_websocket_connect
  Server.start function
* develop:
  Testable server (streamlit#1257)
  DataFrames bigger than 2 dimensions will be shown as text (streamlit#1255)
  Circleci Insights (streamlit#1254)
  Setting listeners on XMLHttpRequest upfront (streamlit#1251)
  Matplotlib fix (streamlit#1253)
  Warn about mutations of arguments of cached functions (streamlit#1249)
  Hash numpy arrays and sample large dataframes (streamlit#1252)
  Improve function context, ignore streamlit functions (streamlit#1240)
  Cypress tests for table styling
  Setup cypress dashboard
  Showing namedtuples as JSON (streamlit#1248)
  Make the "Is Streamlit still running?" dialog's message nicer (streamlit#1231)
  Move caching test, fix function names (streamlit#1250)
* develop:
  Fix streamlit#1230: OSError not caught when writing cache to disk (streamlit#8)
  Improve caching code (streamlit#2)
  Remove incorrect background of code inside error messages (streamlit#4)
  Fix line number in error message (streamlit#3)
  Update README.md
  Implement Caching object to support running code blocks only once (streamlit#1)
  Update README.md
  Add font licenses to NOTICE (streamlit#1267)
  Clean up repo for open source release. Remove unecessary folders. (streamlit#1266)
  Fix st.cache (streamlit#1273)
  Close streamlit#1092 (streamlit#1270)
  Change license header in variables.scss so it's compatible with scss-to-json. (streamlit#1268)
  Sidebar (streamlit#1224)
  Add Apache license and update all license headers (streamlit#1265)
  Handling objects with empty docs in st.help (streamlit#1261)
  Enhance Mochawesome report with screenshots (streamlit#1263)
  Add LICENSES file (streamlit#1262)
@tconkling tconkling requested a review from tvst August 27, 2019 21:34
// Dispatch any pending messages in the queue. This may *not* result
// in our just-decoded message being dispatched: if there are other
// messages that were received earlier than this one but are being
// downloaded, our message won't be sent until they're done.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk of code below isn't any different from before; I just added the explanatory comment because I had a hard time figuring out what it did.

Copy link
Copy Markdown
Contributor

@tvst tvst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a partial review. Only looked at frontend code so far.

* - If the referenced message isn't in our cache, request it from the
* server, cache it, and return it.
*/
public async processMessage(msg: ForwardMsg): Promise<ForwardMsg> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming this something like getFullMessage, maybeFillPayload, or fillPayloadIfNeeded?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you cool with processMessagePayload? The method is doing more than simply filling the message payload: it is also caching that payload, when appropriate; the other suggestions imply to me that the function doesn't mutate the cache's own internal state, but it does.

/**
* Creates a ws:// or wss:// URI for the given path.
*/
export function buildWsUri({host, port}: BaseUriParts, path: string): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have 1 websocket URI, so no need to path. Should always be 'stream'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge deal, but: why would we hardcode this path into a generic utility function? It seems like the hardcoded path, which is "business logic", should remain in WebsocketConnection - possibly being promoted to a const at the top of that file. This would also mean that buildWsUri and buildHttpUri continue to have consistent signatures.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine :)

I tend to prefer to make functions only as generic as they have to be. But in this case it's not a big deal.

possibly being promoted to a const at the top of that file.

Can you do this, at least?

// TODO: do something reasonable here, beyond simply logging.
// Lots of stuff is likely to be broken!
logError(LOG, reason)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call this.stepFsm('CONNECTION_ERROR')

For extra credit, you can pipe the reason all the way to ConnectionManager.setConnectionState (via WebsocketConnection.args.onConnectionStateChange)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried first (sending the CONNECTION_ERROR event), but WebsocketConnection doesn't handle that properly (it fails with an assertion shortly thereafter, when it starts trying to ping the server).

I can fix this, though since we don't have tests set up for any of this websocket stuff, it's dicey; there's at least several bits of state are not being properly cleared during certain state transitions. (The bit I was running into here is the class expecting its this.websocket to be null/undefined when entering the PINGING_SERVER state, which the CONNECTION_ERROR event handler does not do.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh... I see.

Ok, leave it as is for now 👍

/**
* Creates a ws:// or wss:// URI for the given path.
*/
export function buildWsUri({host, port}: BaseUriParts, path: string): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine :)

I tend to prefer to make functions only as generic as they have to be. But in this case it's not a big deal.

possibly being promoted to a const at the top of that file.

Can you do this, at least?

// TODO: do something reasonable here, beyond simply logging.
// Lots of stuff is likely to be broken!
logError(LOG, reason)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh... I see.

Ok, leave it as is for now 👍

if (this.websocket) {
this.websocket.close()
this.websocket = null
this.websocket = undefined
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch this?

I tend to use null for anything that is set manually. This way we can easily tell from the type whether we manually caused something to have no value (null) or if it was initialized that way (undefined).

In normal JS this helps uncover some types of bugs, although in TS it may not make a difference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the field's type from Websocket? | null to Websocket?, so null is no longer a value that can be assigned to the field.

I can see the undefined-vs-null distinction for JS, but if a field is declared as an optional in Typescript, it's not idiomatic to also assign null to it.

"""
self._cache = cache

def set_default_headers(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not subclass _SpecialRequestHandler and remove these methods?

(In which case you should rename _SpecialRequestHandler to something like _HttpRequestHandler)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SpecialRequestHandler sets the no-cache header, which I think we explicitly don't want for these cacheable messages

@tconkling tconkling merged commit 10061f7 into streamlit:feature/hashing Aug 28, 2019
@tconkling tconkling deleted the tim/ForwardMsgDeDuping branch August 28, 2019 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants