Skip to content

Cleanup around script_caller, fix tracking of scripts and ACL logging for RM_Call#11770

Merged
oranagra merged 7 commits intoredis:unstablefrom
oranagra:script_caller_cleanup
Feb 16, 2023
Merged

Cleanup around script_caller, fix tracking of scripts and ACL logging for RM_Call#11770
oranagra merged 7 commits intoredis:unstablefrom
oranagra:script_caller_cleanup

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Jan 31, 2023

  • Make it clear that current_client is the root client that was called by external connection
  • add executing_client which is the client that runs the current command (can be a module or a script)
  • Remove script_caller that was used for commands that have CLIENT_SCRIPT to get the client that called the script. in most cases, that's the current_client, and in others (when being called from a module), it could be an intermediate client when we actually want the original one used by the external connection.

bugfixes:

  • RM_Call with C flag should log ACL errors with the requested user rather than the one used by the original client, this also solves a crash when RM_Call is used with C flag from a detached thread safe context.
  • addACLLogEntry would have logged info about the script_caller, but in case the script was issued by a module command we actually want the current_client. the exception is when RM_Call is called from a timer event, in which case we don't have a current_client.

behavior changes:

  • client side tracking for scripts now tracks the keys that are read by the script instead of the keys that are declared by the caller for EVAL

other changes:

  • Log both current_client and executing_client in the crash log.
  • remove prepareLuaClient and resetLuaClient, being dead code that was forgotten.
  • remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot that serves all commands and is reset only when execution nesting starts.
  • remove code to propagate CLIENT_FORCE_REPL from the executed command to the script caller since scripts aren't propagated anyway these days and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun.
  • fix a module GIL violation issue in afterSleep that was introduced in Freeze time sampling during command execution, and scripts #10300 (unreleased)

… for RM_Call

* Make it clear that current_client is the root client that was called
  by external connection
* add executing_client which is the client that runs the current command
  (can be a module or a script)
* Remove script_caller that was used for commands that have
  CLIENT_SCRIPT to get the client that called the script. in most cases,
  that's the current_client, and in others (when being called from a
  module), it could be an intermidiate client when we actually want the
  original one used by the external connectoin.

bugfixes:
* RM_Call with C flag should log ACL errors with the requested user rather
  than the one used by the original client, this also solves a crash
  when RM_Call is used with C flag from a detached thread safe context.
* addACLLogEntry would have logged info about the script_caller, but in
  case the script was issued by a module command we actually want the
  current_client. the exception is when RM_Call is called from a timer
  event, in which case we don't have a current_client.

behavior changes:
* client side tracking for scripts now tracks the keys that are read by
  the script instead of the keys that are declared by the caller for
  EVAL

other changes:
* Log both current_client and executing_client in the crash log.
* remove prepareLuaClient and resetLuaClient, being dead code that was
  forgotten.
* remove scriptTimeSnapshot and snapshot_time which apparently aren't
  needed anymore. commandTimeSnapshot returns the cached server.mstime
  for all commands nowadays, and it isn't modified even when execuring
  a command during busy scripts since the execution_nesting protectes
  it.
* remove code to propagate CLIENT_FORCE_REPL from the executed command
  to the script caller since scripts aren't propagated anyway these days
  and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP
  is added by scriptResetRun.
Comment on lines +65 to +73
test {test module check regular redis command with user and acl from blocked background thread} {
assert_equal [r set x 5] OK

r ACL LOG RESET
assert_equal [r usercall.reset_user] OK
assert_equal [r usercall.add_to_acl "~* &* +@all -set"] OK

# fails here as testing acl in rm call from a background thread
assert_error {*NOPERM User module_user has no permissions*} {r usercall.call_with_user_bg C set x 10}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACL logging code used to crash here before the fix since REDISMODULE_ARGV_RUN_AS_USER attempted to use the user of the context client instead of the user requested by the context.


# verify that new log entry added
set entry [lindex [r ACL LOG] 0]
assert_equal [dict get $entry username] {module_user}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before the fix, this used to return "default" user instead of "module_user"

assert_equal [dict get $entry context] {lua}
assert_equal [dict get $entry object] {set}
assert_equal [dict get $entry reason] {command}
assert_match {*cmd=usercall.call_with_user_flag*} [dict get $entry client-info]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before the fix it used to return "cmd=evalsha" instead of "cmd=usercall.call_with_user_flag"

@oranagra
Copy link
Member Author

@MeirShpilraien @sjpotter FYI

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly confident the original decisions around CSC was just a simple mistake that was overlooked. There is one thought I have, which is that it's possible that EVAL was never supposed to support tracking, and it was just an unintended side effect that sub-commands were triggering the tracking code. Many use cases don't really make sense in the tracking world. For example, if you have a script that does a read/modify/write on a single key, maybe implementing a prepend command, you'll start tracking the key then get an invalidation immediately. Ideally you would just get an invalidation if you had previously looked at the key, since the script is logically doing a write.

I don't feel that strongly either way, I don't think this is really a common use case.

@oranagra
Copy link
Member Author

oranagra commented Feb 4, 2023

@madolson you may be right about scripts that do read for the purpose of write, but on the other hand, maybe the caller of that script will want to re-issue it when the source changes?
and anyway, we should probably keep triggering CSC tracking on read-only scripts (i.e. simple ones are like MULTI-EXEC possibly doing some some filtering on the output).

what worries me regarding the current change is that there were tests that explicitly check this odd behavior (while it was really simple to fix, i.e. we did already have pointers to both clients when doing this thing).. so i'm paranoid about not being aware of a good reason why it should be the way it was.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I'm not sure if this is the most useful behavior, but it intuitively makes more sense than what we have.

@oranagra oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes breaking-change This change can potentially break existing application labels Feb 7, 2023
@oranagra
Copy link
Member Author

oranagra commented Feb 8, 2023

The PR and breaking change for CSC in scripts was approved in a core-team meeting.
waiting for another reviewer to try to catch bugs and then it can be merged.


/* For the few commands that are allowed during busy scripts, we rather
* provide a fresher time than the one from when the script started (they
* still won't get it from the call due to execution_nesting. For commands
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM by "they still won't get it from call"? won't get what? fresher time?
maybe "(execution_nesting is >=1 for all commands that run during a busy script, so cmd_time_snapshot won't be updated inside call until the script is done)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they won't get a fresher time.
i think your explanation is longer and more technical will be harder to read.
we just need to hint that it's still an inaccurate compromise, and people can read the code to understand it better.
i can replace "it" with "fresher time" if that helps.

let's first discuss the solution and later the comment..

@oranagra oranagra requested a review from soloestoy February 15, 2023 12:18
@oranagra oranagra merged commit 233abbb into redis:unstable Feb 16, 2023
@oranagra oranagra deleted the script_caller_cleanup branch February 16, 2023 06:07
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
… for RM_Call (redis#11770)

* Make it clear that current_client is the root client that was called by
  external connection
* add executing_client which is the client that runs the current command
  (can be a module or a script)
* Remove script_caller that was used for commands that have CLIENT_SCRIPT
  to get the client that called the script. in most cases, that's the current_client,
  and in others (when being called from a module), it could be an intermediate
  client when we actually want the original one used by the external connection.

bugfixes:
* RM_Call with C flag should log ACL errors with the requested user rather than
  the one used by the original client, this also solves a crash when RM_Call is used
  with C flag from a detached thread safe context.
* addACLLogEntry would have logged info about the script_caller, but in case the
  script was issued by a module command we actually want the current_client. the
  exception is when RM_Call is called from a timer event, in which case we don't
  have a current_client.

behavior changes:
* client side tracking for scripts now tracks the keys that are read by the script
  instead of the keys that are declared by the caller for EVAL

other changes:
* Log both current_client and executing_client in the crash log.
* remove prepareLuaClient and resetLuaClient, being dead code that was forgotten.
* remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot
  that serves all commands and is reset only when execution nesting starts.
* remove code to propagate CLIENT_FORCE_REPL from the executed command
  to the script caller since scripts aren't propagated anyway these days and anyway
  this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun.
* fix a module GIL violation issue in afterSleep that was introduced in redis#10300 (unreleased)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants