Skip to content

Segmentation fault in clientwr() #75

@wferi

Description

@wferi

The setup on which #74 was observed exhibits crashes as well. I examined a core file created for version 1.8.1 (with some patching to report RAD_Attr_CUI and RAD_Attr_Operator_Name). The kernel reported:

segfault at 24 ip 08053253 sp b6e97290 error 4 in radsecproxy[8048000+23000]

The generated core file points at:

Core was generated by `/usr/sbin/radsecproxy -i /var/run/radsecproxy.pid -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x08053253 in clientwr (arg=0x9e76f50) at radsecproxy.c:1696
1696	        if (rqout->tries == (*rqout->rq->buf == RAD_Status_Server ? 1 : conf->retrycount + 1)) {

There are several memory accesses on that line, but

Dump of assembler code for function clientwr:
[...]
   0x0805324b <+875>:	mov    0x4(%ebx),%eax
   0x0805324e <+878>:	mov    $0x1,%ecx
=> 0x08053253 <+883>:	mov    0x24(%eax),%edx
   0x08053256 <+886>:	cmpb   $0xc,(%edx)
[...]
(gdb) p $eax
$2 = 0

so the code tried to read offset 0x24 of a null pointer, which is consistent of the above kernel report. radmsg.h helps to identify the 0xc above:

#define RAD_Status_Server 12

The offset of rq in struct rqout is 4 and the offset of buf in struct request is 0x24:

(gdb) p &(((struct rqout *)0)->rq)
$3 = (struct request **) 0x4
(gdb) p &(((struct request *)0)->buf)
$4 = (uint8_t **) 0x24

Thus I'm pretty sure that the immediate cause of the illegal memory access here was rqout->rq being null. The addresses match:

(gdb) p &(rqout->rq)
$5 = (struct request **) 0x9e772f4
(gdb) p/x $ebx+4
$6 = 0x9e772f4

but while the value loaded into eax is null (see above), the actual memory content is not:

(gdb) p/x *(int *)($ebx+4)
$7 = 0xb6cc1af8
(gdb) p *rqout
$8 = {lock = 0x9e788c0, rq = 0xb6cc1af8, tries = 0 '\000', expiry = {tv_sec = 0, tv_usec = 0}}

This seems to be possible only via some data race against another thread updating the structure pointed to by rqout during the run time of clientwr. To top it off:

(gdb) p *rqout->lock
$9 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers = 4293670681, {
      __spins = 0, __list = {__next = 0x0}}}, 
  __size = '\000' <repeats 16 times>, "\031\067\354\377\000\000\000", __align = 0}

which seems to mean that the mutex is not taken. How could that happen? Or is this info unreliable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions