Skip to content

Toml++ unicode checking invokes undefined behaviour #144

@kchalmer

Description

@kchalmer

Environment

toml++ version and/or commit hash: commit 36030ca

Compiler: gcc version 11.1.0 (GCC)

C++ standard mode: gnu17 (gcc 11's default)

Target arch: x86_64

Library configuration overrides: none

Relevant compilation flags: -fsanitize=undefined

Describe the bug

Parsing with toml++ triggers undefined behaviour in the unicode checking routines for certain ASCII characters (see error message in the "Steps to reproduce" section).

Steps to reproduce (or a small repro code sample)

$ cat tomlplusplus_ub_example.cpp

#include <toml++/toml.h>

int main()
{
    auto table = toml::parse("m=1");
}

$ g++ -fsanitize=undefined -I../tomlplusplus/include tomlplusplus_ub_example.cpp -o tomlplusplus_ub_example
$ ./tomlplusplus_ub_example
../tomlplusplus/include/toml++/impl/unicode.h:139:13: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'

Additional information

This only occurs for characters with an ASCII value of 109 or larger. "l=1" (lowercase ell) parses without an error, but "m=1", "n=1", etc. trigger the UB error.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions