The sonic-mgmt-common repo maintains patches to the ygot and goyang libraries, among other things these patches add performance enhancements which introduce a mutex protected cache/map here to the entry struct. See ChildSchema and ChildSchemaMutex.
However, the ygot library will perform a copy on an Entry struct during an Unmarshal operation. If this copy is performed while a go routine has locked the ChildSchemaMutex the copy of the Entry will have the mutex in a locked state and no go routine will ever unlock it. If the go routine performing the Unmarshal attempts to access the cache on the copied Entry it will then wait forever.
Our patches should be updated to account for this behavior of the ygot library and remove the potential for deadlocking a go routine.
A scenario where this has been seen to happen is the following:
Go routine A is created by the gnmi server to handle a subscription:
|
switch mode { |
|
case gnmipb.SubscriptionList_STREAM: |
|
c.stop = make(chan struct{}, 1) |
|
c.w.Add(1) |
|
go dc.StreamRun(c.q, c.stop, &c.w, c.subscribe) |
|
case gnmipb.SubscriptionList_POLL: |
|
c.polled = make(chan struct{}, 1) |
|
c.polled <- struct{}{} |
|
c.w.Add(1) |
|
go dc.PollRun(c.q, c.polled, &c.w, c.subscribe) |
|
case gnmipb.SubscriptionList_ONCE: |
|
c.once = make(chan struct{}, 1) |
|
c.once <- struct{}{} |
|
c.w.Add(1) |
|
go dc.OnceRun(c.q, c.once, &c.w, c.subscribe) |
Go routine A will enter the translib code and call:
commonApp.processGet() --> GetAndXlateFromDB() --> dbDataToYangJsonCreate() --> yangDataFill()
The yangDataFill() function will eventually enter the ygot library calling ytypes.Unmarshal and perform the Entry struct copy here:
https://github.com/openconfig/ygot/blob/v0.7.1/ytypes/list.go#L527
Go routine B is created by the gnmi server to handle another subscription:
Go routine B will create a new Data Client with
NewTranslClient()
The NewTranslClient() will then call
transutil.PopulateClientPaths
PopulateClientPaths then calls
ConvertToURI() which creates a new
PathValidator() and calls Validate.
The Validate function will eventually call into
validateListKeyValues() which enters the ygot library with a call to
ytypes.GetOrCreateNode
This eventually results in a call to retrieveNodeContainer which
gets the schema for a child which then uses the
ChildSchemaMutex and
ChildSchema cache added by our patch.
The sonic-mgmt-common repo maintains patches to the ygot and goyang libraries, among other things these patches add performance enhancements which introduce a mutex protected cache/map here to the entry struct. See
ChildSchemaandChildSchemaMutex.However, the ygot library will perform a copy on an
Entrystruct during anUnmarshaloperation. If this copy is performed while a go routine has locked theChildSchemaMutexthe copy of theEntrywill have the mutex in a locked state and no go routine will ever unlock it. If the go routine performing theUnmarshalattempts to access the cache on the copiedEntryit will then wait forever.Our patches should be updated to account for this behavior of the ygot library and remove the potential for deadlocking a go routine.
A scenario where this has been seen to happen is the following:
Go routine A is created by the gnmi server to handle a subscription:
sonic-gnmi/gnmi_server/client_subscribe.go
Lines 185 to 199 in a610784
Go routine A will enter the translib code and call:
commonApp.processGet() --> GetAndXlateFromDB() --> dbDataToYangJsonCreate() --> yangDataFill()
The yangDataFill() function will eventually enter the ygot library calling ytypes.Unmarshal and perform the Entry struct copy here:
https://github.com/openconfig/ygot/blob/v0.7.1/ytypes/list.go#L527
Go routine B is created by the gnmi server to handle another subscription:
sonic-gnmi/gnmi_server/server.go
Line 290 in a610784
Go routine B will create a new Data Client with NewTranslClient()
The NewTranslClient() will then call transutil.PopulateClientPaths
PopulateClientPaths then calls ConvertToURI() which creates a new PathValidator() and calls Validate.
The Validate function will eventually call into validateListKeyValues() which enters the ygot library with a call to ytypes.GetOrCreateNode
This eventually results in a call to retrieveNodeContainer which gets the schema for a child which then uses the
ChildSchemaMutexandChildSchemacache added by our patch.