-
Notifications
You must be signed in to change notification settings - Fork 154
Allow -bulk to work with stdin #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow -bulk to work with stdin #130
Conversation
v1.0.1 Release
chore: Version updates
Mzack9999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only potential problem I can see (and why the bulk was not allowed with stdin) is massive stdin input that, when used with -bulk will read all the content in memory. Probably making it clear in the docs should suffice.
|
@Mzack9999 That's fair. I'm seeing what can be done to use |
Reduces memory usage when input is big Also includes the fix for projectdiscovery#134 (makes testing easier)
|
Note that I've incorporated #135 into this PR to make testing easier. It's easily enough backed out if that PR isn't approved. I'm also abusing #136 which is a quirk I'm now warming to for testing purposes. I've implemented chunked reading for Large inputs no longer cause excessive memory usage in my limited testing (it's hard to trick notify into processing lots of data quickly lol) As of the previous commit, notifying an invalid webhook of 500MB of data took 4GB of memory (!). I'm using an invalid webhook for speed - notify will still process the data but will quickly fail to do anything useful with it. The new approach uses 22MB of memory: Notifying seems to still be correctly implemented for normal inputs: Where testing gets trickier is with the chunking.
% cat generate.py
#!/usr/bin/env python3
import argparse
from string import ascii_lowercase
from itertools import cycle, islice
parser = argparse.ArgumentParser()
parser.add_argument("-c", "--chunk", type=int,
help="Chunk each line at a certain interval")
parser.add_argument("min", type=int)
parser.add_argument("max", type=int)
parser.add_argument("step", type=int, default=1, nargs="?")
args = parser.parse_args()
for i in range(args.min, args.max, args.step):
line = "".join(islice(cycle(ascii_lowercase), i))
if args.chunk is not None:
line = "\n".join(line[i:i+args.chunk] for i in range(0, len(line), args.chunk))
print(line)The new chunking approach seems to faithfully notify data of this nature: |
|
Given this approach, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Slack rate limit
- index out of range crash
[ERR] failed to send slack notification for id: slack : failed to send slack notification: webhook response: too_many_attachments
panic: runtime error: index out of range [3990] with length 3990
goroutine 1 [running]:
github.com/containrrr/shoutrrr/pkg/util.PartitionMessage({0x14000497000, 0xf96}, {0x7d0, 0x1770, 0xa}, 0x64)
/Users/geekboy/go/pkg/mod/github.com/containrrr/[email protected]/pkg/util/partition_message.go:33 +0x2ec
github.com/containrrr/shoutrrr/pkg/services/discord.CreateItemsFromPlain({0x14000497000, 0xf96}, 0x0)
/Users/geekboy/go/pkg/mod/github.com/containrrr/[email protected]/pkg/services/discord/discord.go:84 +0xc0
github.com/containrrr/shoutrrr/pkg/services/discord.(*Service).Send(0x1400013b440, {0x14000497000, 0xf96}, 0x14000138308)
/Users/geekboy/go/pkg/mod/github.com/containrrr/[email protected]/pkg/services/discord/discord.go:42 +0x164
github.com/containrrr/shoutrrr.Send({0x1400029e3f0, 0x6f}, {0x14000497000, 0xf96})
/Users/geekboy/go/pkg/mod/github.com/containrrr/[email protected]/shoutrrr.go:23 +0xcc
github.com/projectdiscovery/notify/pkg/providers/discord.(*Provider).Send(0x14000128a98, {0x14000505000, 0xf96}, {0x140002a9bb0, 0x8})
/Users/geekboy/Github/notify/pkg/providers/discord/discord.go:54 +0x32c
github.com/projectdiscovery/notify/pkg/providers.(*Client).Send(0x140002c3380, {0x1400037f000, 0xf96})
/Users/geekboy/Github/notify/pkg/providers/providers.go:113 +0xd4
github.com/projectdiscovery/notify/internal/runner.(*Runner).sendMessage(0x140002212d0, {0x1400037f000, 0xf96})
/Users/geekboy/Github/notify/internal/runner/runner.go:169 +0xa4
github.com/projectdiscovery/notify/internal/runner.(*Runner).Run(0x140002212d0)
/Users/geekboy/Github/notify/internal/runner/runner.go:108 +0x614
main.main()
/Users/geekboy/Github/notify/cmd/notify/notify.go:42 +0x1a8|
@ehsandeep I can more of less repro both of those problems on master. I think they're bugs in shoutrrr. I'll consider what we can do in terms of being less aggressive with our |
|
Repro for the attachments error on this PR: Note that the error triggers on >100 lines in a single message. It's not affected by the length of the lines. Reproing a similar-enough issue on master: Reproing the index out of range crash on master (note that the stack trace shows it being a problem in shoutrrr) |
|
Slack issue filed at containrrr/shoutrrr#238 Discord issue filed at containrrr/shoutrrr#240 but note that I could only repro the Discord issue with messages of 2000, 4000 or 6000 chars. The panic seen by @ehsandeep seems to be on a message of 3990 chars, which I can't repro. It might or might not be related to the issue I filed at shoutrrr. TODO: Do what we can to avoid the shoutrrr crashes until such time as they're fixed |
Simplified looping. Also, we no longer truncate mid-line unless it's the first line of a chunk (in such a case we have no option but to truncate). A truncated line will be followed by an ellipsis. Otherwise we're splitting chunks at the last possible newline without exceeding charLimit for that chunk.
|
New strategy for the splitter. We no longer truncate mid-line unless it's the first line of a chunk (in such a case we have no option but to truncate). A truncated line will be followed by an ellipsis. Otherwise we're splitting chunks at the last possible newline without exceeding charLimit for that chunk. Demo: |
|
The previous tests I was doing also still pass, except for the #!/bin/bash
cd $(dirname "$(readlink -f "$0")")
heading() {
echo "[+] $1"
}
provider_invalid_webhook=provider-invalid-webhook.yaml
provider_discord=provider-discord.yaml
provider_slack=provider-slack.yaml
heading /etc/passwd
cmp <(./notify -provider-config $provider_invalid_webhook -silent -bulk -i /etc/passwd) \
<(cat /etc/passwd) && echo matches
heading /dev/null
cmp <(./notify -provider-config $provider_invalid_webhook -silent -bulk -i /dev/null) \
<(cat /dev/null) && echo matches
heading AAAA
cmp <(python -c'print("A"*2048)' | ./notify -provider-config $provider_invalid_webhook -silent -bulk) \
<(python -c'print("A"*2048)') && echo matches
#heading generate.py
#cmp <(./generate.py 0 8000 1 | ./notify -provider-config $provider_invalid_webhook -silent -bulk -char-limit 4000) \
# <(./generate.py 0 8000 1 -c4000) && echo matches
heading 'slack attachment limit (trying 100 lines)'
yes | head -n100 | ./notify -provider-config $provider_slack -bulk 2>&1 >/dev/null| grep too_many_attachments
heading 'slack attachment limit (trying 101 lines)'
yes | head -n101 | ./notify -provider-config $provider_slack -bulk 2>&1 >/dev/null | grep too_many_attachments
heading 'discord characters being 2000, 4000 or 6000 (trying 1999 lines)'
python3 -c'print("A"*1999)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panic
heading 'discord characters being 2000, 4000 or 6000 (trying 2000 lines)'
python3 -c'print("A"*2000)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panic
heading 'discord characters being 2000, 4000 or 6000 (trying 2001 lines)'
python3 -c'print("A"*2001)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panicThe bugs in shoutrrr are still present. Note that shoutrrr has merged fixes for them already, but has not cut a release. We might be able to simply hold off for that release, or bump |
Inclues fixes for: * containrrr/shoutrrr#238 * Sending more than ~99 lines to Slack fails with too_many_attachments * containrrr/shoutrrr#240 * Sending 2000, 4000 or 6000 characters to Discord panics * containrrr/shoutrrr#244 * Sending 1999, 3999 or 5999 characters to Discord panics * (Incomplete fix for the above)
|
Bumped the shoutrrr module to an unreleased commit which includes fixes for the panics I could repro. Note that I could not repro the crash on 3990 characters. Hopefully shoutrrr's fixes also address whatever the issue was. This is now ready for review.
#!/bin/bash
cd $(dirname "$(readlink -f "$0")")
heading() {
echo "[+] $1"
}
provider_invalid_webhook=provider-invalid-webhook.yaml
provider_discord=provider-discord.yaml
provider_slack=provider-slack.yaml
heading /etc/passwd
cmp <(./notify -provider-config $provider_invalid_webhook -silent -bulk -i /etc/passwd) \
<(cat /etc/passwd) && echo matches
heading /dev/null
cmp <(./notify -provider-config $provider_invalid_webhook -silent -bulk -i /dev/null) \
<(cat /dev/null) && echo matches
heading AAAA
cmp <(python -c'print("A"*2048)' | ./notify -provider-config $provider_invalid_webhook -silent -bulk) \
<(python -c'print("A"*2048)') && echo matches
#heading generate.py
#cmp <(./generate.py 0 8000 1 | ./notify -provider-config $provider_invalid_webhook -silent -bulk -char-limit 4000) \
# <(./generate.py 0 8000 1 -c4000) && echo matches
heading 'slack attachment limit (trying 100 lines)'
yes | head -n100 | ./notify -provider-config $provider_slack -bulk 2>&1 >/dev/null| grep too_many_attachments
heading 'slack attachment limit (trying 101 lines)'
yes | head -n101 | ./notify -provider-config $provider_slack -bulk 2>&1 >/dev/null | grep too_many_attachments
heading 'discord characters being 2000, 4000 or 6000 (trying 1999 lines)'
python3 -c'print("A"*1999)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panic
heading 'discord characters being 2000, 4000 or 6000 (trying 2000 lines)'
python3 -c'print("A"*2000)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panic
heading 'discord characters being 2000, 4000 or 6000 (trying 2001 lines)'
python3 -c'print("A"*2001)' | ./notify -provider-config $provider_discord -bulk 2>&1 >/dev/null | grep panic |
|
Kudos, SonarCloud Quality Gate passed!
|
Fixes projectdiscovery#134 (this was previously fixed but got backed out in landing projectdiscovery#130)
* Revert "Fix notify silently fails (#146)" This reverts commit 173f914. * Emit logged lines to stdout, not stderr Fixes #134 (this was previously fixed but got backed out in landing #130) * Split lines as per CharLimit in non-bulk mode Resolves #137 * Bubble up any Scanner errors This surfaces the error that's causing #138: 'bufio.Scanner: token too long' * Fix infinite loop with -char-limit <= len('...') We're using '...' to indicate that a line has been truncated. If -char-limit was less than the length of this ellipsis string, the scanner would never terminate. Raise an error if the charLimit given to a splitter is <= len('...') * Ensure we never allow the Scanner buffer to fill We know we never need more than CharLimit chars from the Scanner in one go First of all, we ensure that the Scanner has a buffer which can hold at least CharLimit chars. Then we handle cases where the Scanner wants more data but we don't need it to get more data. Thus it should never end up in a place where its internal buffer is filled, and it should never return bufio.ErrToLong Fixes #138 * fmt.print => gologger Co-authored-by: mzack <[email protected]>










I'm not sure why we're reading the data file as we are, preventing the use of
-bulkfor stdin. This diff works in my limited testing for regular files in non-bulk and bulk mode, as well as stdin in non-bulk and bulk mode.Being able to send stdin as bulk improves the speed of sending notifications and can result in less notification clutter.
Using
-bulkwith stdin means that, unless we do chunked reads, notify needs to wait for EOF. But if a user is using-bulkwith stdin then hopefully they expect this behaviour.Demo: