Adding strlcpy() to glibc
The problem with functions like strcpy(), of course, is that they perform no length checking on the strings passed to them. The result has been a long history of buffer-overrun issues and security problems. In response, functions like strncpy() were created, but they have problems of their own; in particular, strncpy() can create strings without the terminating null byte, encouraging other types of buffer overruns. So strlcpy() was created to ensure that all strings would be null-terminated. Problem solved, or so one might think.
Back in 2000, one Christoph Hellwig posted a patch adding strlcpy() and strlcat() to glibc. The glibc maintainer at that time, Ulrich Drepper, rejected the patch in classic style:
Beside, those who are using strcat or variants deserved to be punished.
Christoph gave up after a token protest, but other users did not. Over the years, there have been many requests to add these functions to glibc, but the project's position has never changed. Fourteen years after Christoph's patch was posted, there is still no strlcpy() in glibc.
The primary complaint about strlcpy() is that it gives no indication of whether the copied string was truncated or not. So careless code using strlcpy() risks replacing a buffer-overrun error with a different problem resulting from corrupted strings. In the minds of many developers, strlcpy() just replaces one issue with another and, thus, does not solve the problem of safe string handling in C programs. They believe it should not be used, and, since it should not be used, it also should not be provided by the library.
The argument on the other side is simple enough: like it or not, plenty of programs use strlcpy() for string manipulation. If the system library does not provide an implementation, they will provide their own, and that implementation, beyond being duplicated code, is likely to be slower and buggier than a standard library implementation. Failure to support strlcpy() does not cause those programs to be changed; instead, it just makes them use inferior alternatives.
Florian Weimer pointed out that some 60 packages in Fedora use strlcpy(); those packages, he said, will not go away. Among other problems, the implementations of strlcpy() found in those packages do not take advantage of the FORTIFY_SOURCE option provided by GCC. With fortification turned on, a number of buffer overruns can be detected at run time, causing the program to crash but avoiding a potential security hole. The recent, remotely exploitable Samba vulnerability, Florian said, was caused by an erroneous use of strlcpy() that would have been caught if fortification were in use.
This argument has gone back and forth many times over the years (and was covered here in 2012). One might think it
would go on forever, except for one thing: the management of the glibc
project changed significantly in early
2012. Under the new regime, the project has been more open to the addition
of new features. It took a couple years for this particular subject to
come back, but, when David A. Wheeler recently asked if a strlcpy() implementation
might now be accepted, glibc developer Joseph Myers responded that "it would be reasonable
to consider
".
Florian wasted little time in putting together a patch adding the functions to glibc. The first version ran into some criticism (it didn't behave like the BSD version), but the second iteration has been better received. Which is not to say that there is a consensus that these functions should go into the library; some developers see it as encouraging their use. But the prevalent attitude would seem to be one of resigned acceptance; as David Miller put it:
The reasons are simple enough: replace a bunch of hand-rolled strlcpy() implementations with one high-quality library implementation. The good news is that, since most programs use autotools for configuration, those programs would switch over to the glibc implementation on systems where it is available with no intervention required.
So the strlcpy() issue may finally be put to rest. Of course,
that does not solve the bigger problem: what the glibc developers would
recommend for safe, fast, and simple string
handling for C programs. The C language does not lend itself to providing
all three of "safe," "simple," and "fast" in the same package. In many
cases, the right answer is to use a different language anyway. But there
will be a lot of C code out there for a long time, so there will be a lot
of string-handling bugs as well, regardless of whether strlcpy()
is used.
Index entries for this article | |
---|---|
Security | Glibc |
Security | Vulnerabilities/Buffer overflow |
Posted Sep 18, 2014 2:56 UTC (Thu)
by ncm (guest, #165)
[Link] (4 responses)
So, I thought further and evolved a function that might actually improve matters. C isn't C++ (yet), so the constraints were tighter, but within the limits of what can be expressed, this seemed the most practical. I used a minor variant of it for several years at my last employer, and it served well with only one problem, fixed here.
bool strto(
The printf version is trivially different.
It returns false if count bytes (or strlen(src), whichever is less) and a NUL won't fit into dstsize - end, but copies as much as it can to dst + end, ALWAYS placing the NUL. (If it can't place a NUL, it aborts. This cannot happen in sensible code.) Before returning, it writes the position of the NUL into *endp.
In use, the first call usually has a literal 0 in the last place, and, in subsequent calls, the variable that endp points to (always named "e" in my code, as for strtoll). You can chain calls with && to know if everything fit, but if you know the final count isn't zero, the final result suffices. If you want to copy the whole of src, count gets -1.
Posted Sep 18, 2014 6:15 UTC (Thu)
by ncm (guest, #165)
[Link]
Posted Sep 18, 2014 15:03 UTC (Thu)
by intgr (subscriber, #39733)
[Link] (1 responses)
Why do you need the separate "end" argument at all? Presumably you could use *endp for both input and output, the caller simply has to initialize it to 0 before first call.
Posted Sep 19, 2014 5:07 UTC (Fri)
by ncm (guest, #165)
[Link]
Another version swaps the return value with the argument:
e = strto(dst, dstsz, src, count, e, &result);
Tomayto, tomahto. Either one is miles better than strlcpy. But whoever started promoting strlcpy used up all the oxygen available for fixing C string handling until at least 2030. By which time we might all be writing Rust++ code and cursing the shortsightedness of the Rust crew.
The best reason I know for adopting this admittedly complex beast is that it does the work of a half-dozen other functions better than they do. Learn it once, it works for everything.
Posted Sep 24, 2014 21:11 UTC (Wed)
by Mo6eB (guest, #99013)
[Link]
Because yeah, that function is hella useful :) .
Posted Sep 18, 2014 5:33 UTC (Thu)
by roskegg (subscriber, #105)
[Link] (19 responses)
At the very least, how hard is it to have a standard like this:
struct string_t { char* data; size_t len; };
Then make a version of every string handling function, that uses string_t instead of char*. When a string knows its own length it is easy to reason about correct usage of the string.
Posted Sep 18, 2014 7:06 UTC (Thu)
by jnareb (subscriber, #46500)
[Link] (1 responses)
Posted Sep 28, 2014 16:51 UTC (Sun)
by bluss (subscriber, #47454)
[Link]
Posted Sep 18, 2014 7:20 UTC (Thu)
by rsidd (subscriber, #2582)
[Link]
Posted Sep 18, 2014 8:04 UTC (Thu)
by nmav (guest, #34036)
[Link] (2 responses)
So we're stuck with the old str*cpy().
Posted Sep 25, 2014 6:23 UTC (Thu)
by chojrak11 (guest, #52056)
[Link] (1 responses)
Posted Sep 28, 2014 12:11 UTC (Sun)
by tcabot (subscriber, #6656)
[Link]
Posted Sep 18, 2014 14:08 UTC (Thu)
by Thue (guest, #14277)
[Link] (6 responses)
Posted Sep 18, 2014 15:16 UTC (Thu)
by etienne (guest, #25256)
[Link] (5 responses)
Then we would have had:
Posted Sep 18, 2014 21:30 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (1 responses)
Do you really think it would ever be practical to have a NUL-terminated string over 4 GiB in size? Even a few MiB seems like a stretch without caching the length; you couldn't even do simple random-access reads without scanning the whole string for the NUL character. The overhead of a 32-bit load on a 64-bit system seems trivial by comparison.
Posted Sep 19, 2014 9:18 UTC (Fri)
by etienne (guest, #25256)
[Link]
Nope, I said for efficiency. Did you noticed that GCC/ld aligns constant strings to 32 bytes boundary?
Posted Sep 19, 2014 9:04 UTC (Fri)
by epa (subscriber, #39769)
[Link] (2 responses)
I agree with the point about needing a plain character array when you cannot do dynamic allocation. However, it would be possible to hack that up with something like
Posted Sep 19, 2014 9:58 UTC (Fri)
by etienne (guest, #25256)
[Link] (1 responses)
I did say someone will want not only one field for the current size of the string, but also a field for the maximum size of the string...
Nothing can replace the efficiency of (32 bits integers):
Posted Sep 19, 2014 10:38 UTC (Fri)
by epa (subscriber, #39769)
[Link]
But be fair: a plain char* has a maximum size too, depending on what memory was allocated for it, and you still have problems if you strcat() something which makes the string bigger than its maximum size. It's just that the maximum size is not carried along with the string pointer but kept somewhere else (perhaps only in the programmer's head), so it's easier to make mistakes.
Similarly, while you are right that UTF-8 makes things more complicated, UTF-8 makes things more complicated with plain char* too.
Posted Sep 18, 2014 17:11 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (3 responses)
Personally, when parsing or creating complex strings, I tend to use my own personal fifo ring buffer (supporting both dynamic and statically sized buffers), which works just as well for raw octets as for strings. It's a simple header file, and it even has a printf routine.
What's nice about strlcpy is it's a simple API for simple string operations, such as copying small string identifiers or structure string members. If I'm copying a DNS host name, I know host names should never be longer than 255 bytes (or whatever the sized of my statically-sized buffer is). So strlcpy let's me copy those around easily, and easily detect truncation so I can reject the input. And, frankly, sometimes I don't _care_ about truncation. Garbage in is garbage out, and in many cases the fact that somebody passed a string that is too long is no different than if they passed a string which was misspelled. In many cases there's no need to bother treating one different from the other, and in fact it can be counter-productive to do so.[1]
GNU folks are obsessed with using dynamically sized strings everywhere, but I've found that in C that is misguided. Either you use simple fixed sized strings, or you move on to much more structured representations. Liberal use of dynamic, free-form strings is just asking for trouble in C. And even in C++, because sprinkling std::string all over the place just leads to thousands of needless tiny allocations and deallocations all over the place, which is why lots of C++ libraries are unnecessarily slow and memory hungry, not to mention leaky, notwithstanding the arsenal of bells & whistles C++ provides to help address those problems. If you want to lazily juggle many tiny strings around, there are far better languages that one can use, particularly if they support automatic garbage collection.
IME, C _excels_ at text stream parsing and composition. Pointers are wonderous things. So it's not that C sucks at strings per se, it just sucks at juggling lots of small, individual, dynamically-sized strings.
So, I love strlcpy. It's extremely useful. And it's not glibc's business to babysit other programmers. glibc has far clumsier APIs than strlcpy. If anything, adding strlcpy will elevate the IQ of glibc. The central role of glibc isn't to solve everybody's problems, it's to provide a stable platform for building applications, and in particular tracking widely used APIs as they evolve without causing unnecessary portability headaches.
The fact of the matter is that strlcpy has become widely used. The BSDs have sucked it up and adopted atrocious GNU APIs like fmemopen and open_memstream. Those have to be two of the most awkward ways to address a legitimately pressing problem, and more than a little hypocritical. 1) fmemopen doesn't specify how fwrite sets errno when it reaches the end of the buffer (likewise, what happens when open_memstream cannot allocate?), and 2) almost nobody bothers checking fwrite or fprintf, anyhow! And those APIs were purposefully built to solve the dynamic string construction problem!. So maybe glibc can get off their high horse.
[1] Obviously you have to be careful about that because of things like smuggling attacks, and occasionally truncation can be worse than other malformed inputs, perhaps because the string is semi-structured and prefixes have special meaning. But in any event, sanity checking separated from where data is actually used and applied is not best practice[2], and so ideally you avoid putting yourself in situations where truncation is uniquely problematic. It's a similar problem to the TOCTTOU race condition, and the solutions are similar.
Good example: Lua 5.2 removing the bytecode sanity checker. The sanity checker was simply additional surface area for bugs, and invariably had holes because making the checker's semantics line up perfectly with the VM was infeasible in practice, so there was always going to be a way to smuggle bad bytecode into the VM. To their credit, it only took one exploit for them to admit to themselves that it was preferable to focus the team's limited resources on removing bugs in the VM (and, if I were them, work on VM bug mitigations like removing avenues for direct memory access), and generally simplifying code rather than spending it maintaining and fixing the bytecode checker.
[2] Unless the sanity check is merely for convenience in spotting bugs, like function entry assertions. The point is merely that such sanity checking shouldn't change the way you implement the logic elsewhere, where the data is actually used, because such logic should be intrinsically robust enough to deal with malformed data without relying on guard code somewhere else.
Posted Sep 19, 2014 9:08 UTC (Fri)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted Sep 19, 2014 11:26 UTC (Fri)
by sorokin (guest, #88478)
[Link] (1 responses)
For example from MSVC <xstring> internal header:
Posted Sep 19, 2014 20:20 UTC (Fri)
by epa (subscriber, #39769)
[Link]
Posted Sep 19, 2014 13:44 UTC (Fri)
by ehiggs (subscriber, #90713)
[Link] (1 responses)
Posted Sep 19, 2014 14:52 UTC (Fri)
by etienne (guest, #25256)
[Link]
The first field "data" is a pointer, so always the size of an address - you do not want to write that to disk...
Posted Sep 18, 2014 7:16 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link] (6 responses)
It would seem that a big source of frustration about adding strlcpy() and strlcat() is that their return types are size_t instead of glibc's char * (for strncpy() and strncat()). At least in the libbsd version.
IMO this alone might break software where one was used as a drop-in replacement for the other.
Posted Sep 18, 2014 13:50 UTC (Thu)
by ncm (guest, #165)
[Link] (5 responses)
People have not worried much over design errors in the standard C string library because the correct solution is to use a better language. I don't know of any language that does strings worse, unless you count sh.
Posted Sep 18, 2014 16:25 UTC (Thu)
by smoogen (subscriber, #97)
[Link] (1 responses)
Posted Sep 19, 2014 16:03 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Sep 18, 2014 18:23 UTC (Thu)
by Koromix (subscriber, #71455)
[Link] (2 responses)
Posted Sep 19, 2014 9:11 UTC (Fri)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Sep 19, 2014 23:34 UTC (Fri)
by Koromix (subscriber, #71455)
[Link]
I almost never use strcat because stpcpy is more efficient.
This is enough when your code does little string handling (some identifiers, a few paths here and there). Beyond that, it's inefficient and insecure and you better use another language (C++) or at the very least a higher-level string handling code/library.
Note that for cross-platform code you may need to provide fallback implementations of asprintf and stpcpy/stpncpy.
Posted Sep 18, 2014 8:10 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
This is just a personal opinion, but I see it as replacing an issue which is quite likely to go unnoticed (except by the wrong people) with one which is somewhat more likely to be spotted.
Posted Sep 18, 2014 9:31 UTC (Thu)
by moltonel (guest, #45207)
[Link]
The first group of functions cause off-by-ones and buffer overflows, which may or may not crash your program and may or may not be exploitable. Crashes are comparatively less annoying. Exploits are potentially very bad. Thankfully, tools like Valgrind and Coverity can find those issues, if you care to run them.
The second group of functions cause truncated output, which range from minor annoyance (incomplete logs) to exploits (password shortened to a trivial length) to business-destroying (months of data suddenly discovered to be unusable). That class of bugs is harder to find using automated tools.
YMMV. I'd rather deal with memory corruption than data corruption : it's easyer to detect and has a (depending on your luck) less severe failure mode.
That said, I agree with the resignated inclusion of the new glibc functions. But I also wouldn't do string manipulation in C at all if I can avoid it :p
Posted Sep 18, 2014 9:18 UTC (Thu)
by error27 (subscriber, #8346)
[Link] (6 responses)
In the kernel we worry that strlcpy could read beyond the end of the source string until it hits unmapped memory and cause an Oops. I don't think anyone has seen this happen in real life because you would have to get very unlucky, but it's something that could happen in theory.
The other worry for us in the kernel is that strlcpy() doesn't pad the dest string with NULs to the end of the buffer and we often need that to prevent information leaks.
So for the kernel, we're probably going to introduce a new strcpy() clone with strnlen() and padding.
Posted Sep 18, 2014 9:40 UTC (Thu)
by moltonel (guest, #45207)
[Link] (5 responses)
Posted Sep 18, 2014 10:54 UTC (Thu)
by error27 (subscriber, #8346)
[Link] (1 responses)
There are a couple rare legacy subsystems like ISDN that have command line inputs. ISDN has a terrible record for buffer overflows, btw. These days we control the kernel with sysfs where the kernel takes a 4k, always NUL terminated buffer. We call kstrtoul(buf, 0, &val) on it. People don't use sscanf() in new code very much.
There are a couple subsystems which use length-tagged strings where you have to allow NUL chars in the middle. But it's pretty rare and there are no standards for this.
There are other places where the subsystem says a field is at most 32 characters and it's NUL terminated if there is enough space but otherwise not. Wonderful, no?
Most of the time you don't really care if a string gets truncated a bit. What I mean is, you have a sound card where the name of the sound card is missing the last two characters of the model name in dmesg. No one will ever notice or care. If someone does notice it with a static analysis tool the fix is to shorten the name, not make the buffer larger.
Even if you converted everything to use length-tagged strings internally, you still would need to copy it to a fixed length buffer eventually and pad rest of the buffer with NUL chars. So you would end up silently truncating anyway.
Posted Sep 18, 2014 11:14 UTC (Thu)
by error27 (subscriber, #8346)
[Link]
In that example, the source string was saved on the stack. I still feel that it wouldn't be possible to trigger the bug because we knew we would hit a NUL character before we hit unmapped memory. We fixed it to be on the safe side.
Posted Sep 26, 2014 15:19 UTC (Fri)
by JanC_ (guest, #34940)
[Link] (2 responses)
Posted Sep 26, 2014 15:21 UTC (Fri)
by JanC_ (guest, #34940)
[Link] (1 responses)
Posted Sep 28, 2014 18:20 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link]
Posted Sep 18, 2014 10:41 UTC (Thu)
by gb (subscriber, #58328)
[Link] (2 responses)
14 years of the patch existence doesn't mean that idea behind the patch is good. Either that some programs use same function makes that function good design. And neither make such patch a shiny "new feature".
It's not that black and white as it is in article - Ulrich's articles are very good and glibc is excellent piece of software. Yes, he is rude and didn't want to spend a time to politely decline things which won't match - but that doesn't mean that reasons are inappropriate.
Posted Sep 18, 2014 19:03 UTC (Thu)
by vomlehn (guest, #45588)
[Link]
Convincing people to use another approach is a whole different issue and becomes a mixture of engineering and religion. Witness the rest of the discussion.
Posted Sep 24, 2014 22:37 UTC (Wed)
by lally (subscriber, #71211)
[Link]
Posted Sep 18, 2014 11:23 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Sep 18, 2014 11:23 UTC (Thu)
by kugel (subscriber, #70540)
[Link] (5 responses)
I can't follow this. The return value of strlcpy (basically strlen(src)) can be used to test for truncation.
if (retval >= sizeof(buffer)) abort();
Posted Sep 18, 2014 13:03 UTC (Thu)
by tshow (subscriber, #6411)
[Link] (4 responses)
You can check against the source string:
if(strlcpy(dst, src, dst_size) != (strlen(src) + 1)) fail();
You can check against the destination buffer size:
if(strlcpy(dst, src, dst_size) > dst_size) fail();
Either of those is vastly preferable to the hoop jumping for strncpy(), because strncpy() will leave the nul off the end if you overrun the buffer; I always wind up doing this:
dst[dst_size - 1] = 0;
strncpy(dst, src, dst_size - 1);
So, zero the last byte in the buffer, and then tell strncpy() not to touch that last byte.
If I need to ensure the string fits in the buffer:
if(strlen(src) > (dst_size - 1)) fail();
Posted Sep 18, 2014 13:38 UTC (Thu)
by ncm (guest, #165)
[Link] (2 responses)
Posted Sep 18, 2014 20:15 UTC (Thu)
by tshow (subscriber, #6411)
[Link] (1 responses)
"Don't use nul-terminated strings." is a non-starter, unless you're coupling that with "and use this other language that isn't C". The same goes for "don't use a pre-existing buffer".
Posted Sep 19, 2014 5:16 UTC (Fri)
by ncm (guest, #165)
[Link]
Posted Sep 21, 2014 18:01 UTC (Sun)
by k8to (guest, #15413)
[Link]
It's definitely safer, imo, to wrap this into a local functionlet than to type it manually everywhere.
Posted Sep 18, 2014 13:35 UTC (Thu)
by sorokin (guest, #88478)
[Link] (12 responses)
One of reasons why I prefer C++ over C is std::string. It support automatic memory reallocation on append() and it frees memory automatically when string goes out of scope.
If C had destructors I think we could had a decent string class in C too.
Posted Sep 18, 2014 18:26 UTC (Thu)
by kugel (subscriber, #70540)
[Link] (3 responses)
Don't like strdup()?
Posted Sep 19, 2014 8:19 UTC (Fri)
by sorokin (guest, #88478)
[Link] (2 responses)
Posted Sep 19, 2014 9:17 UTC (Fri)
by tao (subscriber, #17563)
[Link] (1 responses)
g_strconcat() -- concatenates a NULL-terminated list of strings
and of course my favourites:
g_strdup_printf(), g_strdup_vprintf(), g_vasprintf() -- sprintf()/vsprintf() variants that allocated the necessary memory.
Posted Sep 20, 2014 0:04 UTC (Sat)
by debacle (subscriber, #7114)
[Link]
Posted Sep 18, 2014 20:17 UTC (Thu)
by tshow (subscriber, #6411)
[Link] (6 responses)
Posted Sep 19, 2014 8:22 UTC (Fri)
by sorokin (guest, #88478)
[Link] (4 responses)
Posted Sep 20, 2014 7:08 UTC (Sat)
by alankila (guest, #47141)
[Link] (3 responses)
(Some degree of defensive programming may occur if you are hard pressed for memory and deal with giant strings -- sometimes it is in your interests to ensure that these strings will be freed in a timely fashion. But in practice, the memory sizes have really exploded in the past decade or so, so I think such concerns are losing relevance over time.)
Posted Sep 22, 2014 12:59 UTC (Mon)
by sorokin (guest, #88478)
[Link] (2 responses)
I would say "You can avoid having to make defensive copies if you have immutable strings". I removed one constraint and the statement is still true. Even better: "You can avoid paying for defensive copies if you have string use copy-on-write". Note that current implementation of GCC use copy-on-write optimization and so it avoid excessive copying.
> There are also no object ownership questions, and no use-after-free bugs.
This applies to std::string too.
> After this, strings look and feel like raw values, similar to, say, an integer -- you will never worry who else has a copy of that integer because it doesn't affect your life in the slightest way.
Exactly like std::string works. You can consider std::string (and any other STL container) as a value, similar to int, that can be copied, passed to/returned from function:
std::string f()
if (...)
return a;
void g(std::string const& a)
int main()
Posted Sep 22, 2014 15:19 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link] (1 responses)
Posted Sep 29, 2014 9:52 UTC (Mon)
by oldtomas (guest, #72579)
[Link]
So, I'd rather say ref counting may be fairly efficient as long as your reference graph is pretty sparse. Once it becomes dense, you'll become painfully aware of how slow your main memory really is :-)
Posted Sep 26, 2014 19:32 UTC (Fri)
by HelloWorld (guest, #56129)
[Link]
Posted Sep 19, 2014 0:26 UTC (Fri)
by david.a.wheeler (subscriber, #72896)
[Link]
In a *vast* number of cases in existing code the size of the buffer is already fixed (e.g., it's existing code, it's a a fixed-size array in a struct, etc.). Having standard functions to address this common case is very useful.
In many cases dynamically-allocated strings are the better choice. In that case, strdup and asprintf are especially useful. But since fixed (preallocated) buffers and dynamically-allocated buffers are both in wide use, it's important to have useful functions for each case.
Posted Sep 19, 2014 12:54 UTC (Fri)
by hp (guest, #5220)
[Link] (16 responses)
Using raw char* with strcpy/strlcpy/etc. is a good way to shoot yourself in the foot. Just because it's C doesn't mean you have to write everything in max-lowlevel style. Use an abstraction so you don't have pointer math and length checks all over the place.
People know the rule "abstract the tricky parts to avoid navigating them repeatedly" in most languages, but then they get to C and make a mess for some reason.
My suspicion is that introductory C texts talk about strcpy and scanf and whatever, so people start thinking these functions are reasonable.
libc doesn't have reasonable abstractions for many common tasks. That's why higher-level libraries exist. Use them, unless you are competent enough to write your own equivalent.
Posted Sep 26, 2014 19:53 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (15 responses)
Posted Sep 26, 2014 20:06 UTC (Fri)
by hp (guest, #5220)
[Link] (14 responses)
Posted Sep 26, 2014 21:03 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (13 responses)
Posted Sep 26, 2014 21:49 UTC (Fri)
by PaXTeam (guest, #24616)
[Link] (12 responses)
Posted Sep 28, 2014 0:57 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (10 responses)
Posted Sep 28, 2014 16:23 UTC (Sun)
by PaXTeam (guest, #24616)
[Link] (9 responses)
if you cannot comprehend simple logic (never mind the paraphrase of the Nietzsche reference) then commenting on lwn isn't a good idea.
Posted Sep 28, 2014 18:24 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (8 responses)
> if you cannot comprehend simple logic
Posted Sep 28, 2014 18:38 UTC (Sun)
by PaXTeam (guest, #24616)
[Link] (5 responses)
Posted Sep 29, 2014 11:59 UTC (Mon)
by nye (subscriber, #51576)
[Link] (2 responses)
Um, I'm not sure how you can be this wrong without realising it. I can't seem to recall the last time I saw a logical error quite so egregious.
Posted Sep 29, 2014 12:07 UTC (Mon)
by tao (subscriber, #17563)
[Link] (1 responses)
Personally I think that when it comes to eradicating programming languages I'd much rather go for PHP :P
Posted Sep 29, 2014 12:26 UTC (Mon)
by JGR (subscriber, #93631)
[Link]
Replacing C with something that has the barrel-ends and feet less close together by default would have a more significant impact.
Posted Oct 7, 2014 1:25 UTC (Tue)
by HelloWorld (guest, #56129)
[Link] (1 responses)
Posted Oct 7, 2014 6:04 UTC (Tue)
by bronson (subscriber, #4806)
[Link]
Posted Sep 28, 2014 19:44 UTC (Sun)
by filipjoelsson (guest, #2622)
[Link] (1 responses)
Posted Oct 7, 2014 22:27 UTC (Tue)
by HelloWorld (guest, #56129)
[Link]
Posted Sep 29, 2014 11:09 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted Sep 19, 2014 19:38 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (4 responses)
Posted Sep 21, 2014 18:06 UTC (Sun)
by k8to (guest, #15413)
[Link] (1 responses)
Posted Sep 21, 2014 21:25 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link]
Posted Sep 21, 2014 23:15 UTC (Sun)
by david.a.wheeler (subscriber, #72896)
[Link] (1 responses)
No. strlcpy/strlcat do NOT do silent truncation. They provide truncation detection, it's defined in the return value. It's true that they truncate if you ignore the return value, but the list of C functions that require checking the error result is long; these are just two more. They are also WAY better than current strcpy/strcat, which silently perform buffer overflows; the current functions are MUCH more dangerous. The strncpy/strncat functions don't do the job that people actually want, and they are dangerous to use as well. E.g., they don't guarantee \0 termination, strncpy adds many useless \0s, and strncat's bad interface basically guarantees off-by-one buffer overflows. The only useful function for pre-allocated buffers in standard C is snprintf, but it's a pain to use, wordy, and inefficient for the simple common cases of string copy and string concatenate.
I originally tried to press for annex K acceptance, but that got a fair amount of pushback. The fundamental problem is that C doesn't have a standard exception handling system (longjmp does NOT count)... so what DO you do when there's an error and you don't want it to be easily ignored? The annex K solution is to call a configurable handler on buffer overflow, but many GLIBC developers rejected that approach.
Posted Sep 21, 2014 23:22 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link]
Right, and by default, the program aborts in that sense. There's no way that the program should continue after trying to trying to fit a square peg in a round hole. strcpy_s is a drop-in replacement for strcpy: you should use it when you know that the destination string will fit. You should hope that strcpy_s's error case is never encountered and should feel confident that if it is, your program won't be vulnerable to anything worse than a DoS.
Now, if you have a string that might legitimate not fit in the destination buffer (say, it's untrusted user input), you want to use strncpy_s, which is designed for the task, and which is nothing like the flawed strncpy.
strlcpy proponents, on the other hand, conflate these two use cases, and this conflation will result in programs that end up truncating strings without complaint. It's only a matter of time before this behavior produces a security problem.
Annex K is much, much better than these awful BSD-derived string functions.
Posted Sep 25, 2014 6:31 UTC (Thu)
by chojrak11 (guest, #52056)
[Link] (1 responses)
To me it's amazing that since its inception in 70s, the C's standard library for handling strings wasn't replaced, or that there's no alternative second portable core library in C, and that strcpy/strcat etc. aren't globally discouraged with bold warnings by all compilers. We've invented so much crap since then, but no-one was able to standardize a new string library. That's sad.
Posted Sep 25, 2014 8:13 UTC (Thu)
by rodgerd (guest, #58896)
[Link]
Posted Sep 27, 2014 21:21 UTC (Sat)
by brouhaha (subscriber, #1698)
[Link] (1 responses)
If someone can point me to documentation or examples for how to integrate FORTIFY_SOURCE support to functions like strlcpy, I'd be glad to do it. Nine months ago I stated that in a comment in the bug report, and there was no response.
Posted Oct 2, 2014 0:13 UTC (Thu)
by nix (subscriber, #2304)
[Link]
There is, to my knowledge, no documentation for this.
Posted Sep 27, 2014 22:30 UTC (Sat)
by JIghtuse (guest, #95703)
[Link] (1 responses)
Posted Jul 21, 2017 0:41 UTC (Fri)
by kmeyer (subscriber, #50720)
[Link]
Adding strlcpy() to glibc
char* dst, size_t dstsize,
char* src, size_t count,
size_t* endp, size_t end);
Adding strlcpy() to glibc
Adding strlcpy() to glibc
> size_t* endp, size_t end);
> In use, the first call usually has a literal 0 in the last place, and, in subsequent calls, the variable that endp points to
Adding strlcpy() to glibc
Do I understand right that this is basically
Adding strlcpy() to glibc
bool arraycopy(void* dst, size_t dstsize, size_t dstoffset,
void* src, size_t srcsize,
size_t* endoffset)
with the dstoffset arg moved to the end and an implicit padding with a 0 byte?
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
- string with a length limited to 255 bytes
- string with a length limited to 65535 bytes (ms-dos strings)
- string with a length limited to (2^32-1) bytes (Unix strings)
- string with a length limited to (2^64-1) bytes for efficiency
- string with a variable length encoded on 1 or 3 bytes (when first byte is 255, use the following 2 bytes)
with all the supported functions, like strl255cpy().
Then you need the variable string where you manage the allocated size and the string length differently, so two fields of either 16 or 32 bits.
Then you need strings composed of different constant substrings, like pointers embedded in the string.
Everybody old enough has (at least started to) write such a library.
I personally also did need the multilingual string, "hello\0Bonjour\0Gutentag\0"...
And then we would still not have an array of char (to manage under interrupt where no memory allocation is possible)...
Adding strlcpy() to glibc
> - string with a length limited to (2^64-1) bytes for efficiency
Adding strlcpy() to glibc
The thing is you are going to copy that string, and a copy of aligned memory has been measured a lot faster than unaligned memory - and any strcat() will copy only the string itself, not the length.
Same for comparing that string to a pattern, about loading unaligned memory.
If the size of the string is a size_t, then it is the maximum object size supported by that platform, which is practically speaking the longest possible string anyway. 8086 with segmented mode was limited to 64k strings even with the NUL-terminated scheme, I think.
Adding strlcpy() to glibc
char a[20];
struct str s;
s.size = 20;
s.ptr = &a;
Then you have made a string object without needing dynamic allocation.
Adding strlcpy() to glibc
> struct str s;
> s.size = 20;
> s.ptr = &a;
So there you did add an indirection, and did not solve the main problem: what happen when you strcat() something which makes the string bigger than it maximum size?
Also the next improvement of that string is to store the number of UTF8 characters, with lasy evaluation because it may never be used (if (NbUtf8 == 0 && s.length != 0 && s.size != 0) s.NbUtf8 = CptUtf8(s.ptr);)
void fct(int i) {
char error[sizeof("Error Nb: _10_digits")];
sprintf (error, "Error Nb: 0x%X", i);
printf (error); /* i.e. use "error" string */
}
Adding strlcpy() to glibc
Adding strlcpy() to glibc
You make an interesting point about std::string being too slow because of dynamic allocation. Is there a library (from Boost or elsewhere) that gives statically allocated strings of a maximum length?
Adding strlcpy() to glibc
max_length_string<50> str = "hello";
Then you can use it just like a std::string, but it has allocated 50 characters of storage on the stack and will never make any further allocations. If you try to put a too-long value it throws an exception.
Adding strlcpy() to glibc
_BUF_SIZE = 16 / sizeof (value_type) < 1 ? 1 : 16 / sizeof (value_type)
value_type _Buf[_BUF_SIZE];
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
> That should probably have the fields the other way around so if it's ever written to disk as is you can parse the string using the length.
You probably were thinking of: struct string_t {size_t len; char data[0];};
Note that using a free()-able pointer for a string (in case the string grows) means that the string cannot be in the stack, so probably an order of magnitude slower because it won't usually be in the processor cache.
And it is so much overhead to manage a string like (4 bytes):
const char extension[] = "exe";
strlcpy() in glibc
strlcpy() in glibc
strlcpy() in glibc
strlcpy() in glibc
strlcpy() in glibc
strlcpy() in glibc
strlcpy() in glibc
- asprintf is great, you can't abuse it
- malloc + a suite of stpcpy
- strdup (with in-place characters substitutions on the copy if needed, such as fixing slashes in paths)
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
To minimize damage:
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
g_strjoin() -- similar, but allows to add an optional separator
g_strjoinv() -- concatenates an array of strings, with an optional separator
g_strdupv() -- duplicate an array of strings
g_strsplit() -- splits a string into an array
g_strescape()/g_strcompress() -- escape/unescape a string
g_strdelimit() -- change delimiter (let's say you've got a comma-delimited string and want it to be space delimited)
g_strstrip() -- strip away leading + trailing whitespace
g_strchomp() -- strip away leading whitespace
g_strchug() -- strip away trailing whitespace
g_strnfill() -- creates a string of the specified length filled with a specified character
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
{
std::string a = "hello";
a += ", world";
}
{
std::string b = "msg: " + a;
std::cout << b;
}
{
g(f());
}
Adding strlcpy() to glibc
reference counting
No, no. This is a widespread misconception. Ref counting looks so simple, but it transforms what could be read-accesses to memory (taking/releasing a reference) into write accesses (incrementing/decrementing ref counters), which messes horribly with memory caches and clever compilers.
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Bottom line: C is garbage, and we need to just make it go away.
Adding strlcpy() to glibc
HelloWorld is dead - random C programmer 2114AD.
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
No I'm saying that “I wish C were dead” is a completely different statement than “C is dead”.
Dude, you are obviously not able to understand simple distinctions like the above and yet you're trying to lecture me on logic? Do yourself a favour and shut up.
Adding strlcpy() to glibc
Adding strlcpy() to glibc
>one is a consequence of the other therefore when you make one you're also making (=implying) the other. logic 101 'dude'
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
No it's not. I don't even understand how anybody could think something as nonsensical as that.
Adding strlcpy() to glibc
Adding strlcpy() to glibc
(At least I cracked up, when at long last I got the reference.)
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Or do I get to write it myself for most of my platforms.
Adding strlcpy() to glibc
strlcpy/strlcat do NOT do silent truncation
strlcpy/strlcat do NOT do silent truncation
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Adding strlcpy() to glibc
Sorry for a long url, G+ is kinda mad.
Adding strlcpy() to glibc