Skip to content

Potential Deadlock in ygot/goyang patches #266

@slicking

Description

@slicking

The sonic-mgmt-common repo maintains patches to the ygot and goyang libraries, among other things these patches add performance enhancements which introduce a mutex protected cache/map here to the entry struct. See ChildSchema and ChildSchemaMutex.

However, the ygot library will perform a copy on an Entry struct during an Unmarshal operation. If this copy is performed while a go routine has locked the ChildSchemaMutex the copy of the Entry will have the mutex in a locked state and no go routine will ever unlock it. If the go routine performing the Unmarshal attempts to access the cache on the copied Entry it will then wait forever.

Our patches should be updated to account for this behavior of the ygot library and remove the potential for deadlocking a go routine.

A scenario where this has been seen to happen is the following:
Go routine A is created by the gnmi server to handle a subscription:

switch mode {
case gnmipb.SubscriptionList_STREAM:
c.stop = make(chan struct{}, 1)
c.w.Add(1)
go dc.StreamRun(c.q, c.stop, &c.w, c.subscribe)
case gnmipb.SubscriptionList_POLL:
c.polled = make(chan struct{}, 1)
c.polled <- struct{}{}
c.w.Add(1)
go dc.PollRun(c.q, c.polled, &c.w, c.subscribe)
case gnmipb.SubscriptionList_ONCE:
c.once = make(chan struct{}, 1)
c.once <- struct{}{}
c.w.Add(1)
go dc.OnceRun(c.q, c.once, &c.w, c.subscribe)

Go routine A will enter the translib code and call:
commonApp.processGet() --> GetAndXlateFromDB() --> dbDataToYangJsonCreate() --> yangDataFill()
The yangDataFill() function will eventually enter the ygot library calling ytypes.Unmarshal and perform the Entry struct copy here:
https://github.com/openconfig/ygot/blob/v0.7.1/ytypes/list.go#L527

Go routine B is created by the gnmi server to handle another subscription:

err = c.Run(stream)

Go routine B will create a new Data Client with NewTranslClient()
The NewTranslClient() will then call transutil.PopulateClientPaths
PopulateClientPaths then calls ConvertToURI() which creates a new PathValidator() and calls Validate.
The Validate function will eventually call into validateListKeyValues() which enters the ygot library with a call to ytypes.GetOrCreateNode
This eventually results in a call to retrieveNodeContainer which gets the schema for a child which then uses the ChildSchemaMutex and ChildSchema cache added by our patch.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions