Verification Checklist
Go Version
1.24
Dubbo-go Version
v3.3.1
OS
Linux / All platforms
Bug Description
A comprehensive audit of the dubbo-go codebase reveals 44 data race conditions across the framework. The most critical is a systematic pattern in the common/extension/ package where 20+ global maps are accessed without any synchronization, plus a confirmed bug where SetProviderService uses the wrong lock variable. These issues will cause fatal error: concurrent map read and map write crashes in production.
Running go test -race on any test that exercises dynamic service registration/deregistration, configuration updates, or concurrent RPC calls will reproduce these races.
Category 1: common/extension/ — 20+ global maps with zero lock protection (P0)
Every file in common/extension/ follows the exact same unsafe pattern:
var things = make(map[string]func() Thing)
func SetThing(name string, v func() Thing) {
things[name] = v // write: no lock
}
func GetThing(name string) Thing {
if things[name] == nil { // read: no lock
panic("...")
}
return things[name]()
}
Affected files (all in common/extension/):
| File |
Global map(s) |
Extra risk |
filter.go |
filters, rejectedExecutionHandler |
Has UnregisterFilter + GetAllFilterNames (runtime mutation) |
protocol.go |
protocols |
Hot path: every invoker creation |
registry.go |
registries |
|
cluster.go |
clusters |
Hot path: invoker reorganization |
loadbalance.go |
loadbalances |
Has UnregisterLoadbalance (runtime mutation) |
configurator.go |
configurator |
Config center hot reload path |
config_center.go |
configCenters |
|
config_center_factory.go |
configCenterFactories |
|
config.go |
configs |
|
metadata_report_factory.go |
metaDataReportFactories |
|
proxy_factory.go |
proxyFactories |
|
tps_limit.go |
tpsLimitStrategy, tpsLimiter |
|
auth.go |
authenticators, accessKeyStorages |
|
rest_client.go |
restClients |
|
rest_server.go |
restServers |
|
config_reader.go |
configReaders, defaults |
GetDefaultConfigReader returns internal map reference! |
config_post_processor.go |
processors |
|
router_factory.go |
routers |
GetRouterFactories returns internal map reference! |
registry_directory.go |
directories, defaultDirectory |
|
service_discovery.go |
discoveryCreatorMap |
|
service_instance_selector_factory.go |
serviceInstanceSelectorMappings |
|
logger.go |
logs |
|
otel_trace.go |
traceExporterMap |
Iterated in shutdown callback |
Worst cases: GetDefaultConfigReader() and GetRouterFactories() return the internal map reference directly — callers iterate the returned map without any lock, while Set* functions write to the same map concurrently. This is guaranteed to crash.
Recommended fix: Introduce a generic Registry[T] container with built-in sync.RWMutex:
type Registry[T any] struct {
mu sync.RWMutex
items map[string]T
}
func (r *Registry[T]) Set(name string, v T) {
r.mu.Lock()
defer r.mu.Unlock()
r.items[name] = v
}
func (r *Registry[T]) Get(name string) (T, bool) {
r.mu.RLock()
defer r.mu.RUnlock()
v, ok := r.items[name]
return v, ok
}
func (r *Registry[T]) Snapshot() map[string]T {
r.mu.RLock()
defer r.mu.RUnlock()
m := make(map[string]T, len(r.items))
for k, v := range r.items {
m[k] = v
}
return m
}
This would fix all 20+ files with a single refactor.
Category 2: dubbo.go SetProviderService uses wrong lock (P0)
File: dubbo.go:300-306
var (
consumerServices = map[string]*client.ClientDefinition{}
conLock sync.RWMutex // for consumerServices
providerServices = map[string]*server.ServiceDefinition{}
proLock sync.RWMutex // for providerServices
)
func SetProviderService(svc common.RPCService) {
conLock.Lock() // BUG: should be proLock!
defer conLock.Unlock()
providerServices[...] = ... // writing providerServices with wrong lock
}
SetProviderService uses conLock (consumer lock) to protect writes to providerServices, but loadProvider() (L219-220) uses proLock.RLock() to protect reads of the same map. Two different locks protecting the same map = no protection at all. Concurrent SetProviderService and loadProvider will crash.
Also: GetConsumerConnection (L308-310) reads consumerServices without holding conLock at all.
Fix: Change L301 conLock.Lock() → proLock.Lock(). Add conLock.RLock() to GetConsumerConnection.
Category 3: RouterChain.Route reads invokers without lock (P0)
File: cluster/router/chain/chain.go:55-67, 91-98
func (c *RouterChain) Route(...) []base.Invoker {
finalInvokers := make([]base.Invoker, 0, len(c.invokers)) // read: NO LOCK
for _, invoker := range c.invokers { // iterate: NO LOCK
// ...
}
}
func (c *RouterChain) SetInvokers(invokers []base.Invoker) {
c.mutex.Lock() // write: has lock
defer c.mutex.Unlock()
c.invokers = invokers
}
Route() is the hottest path in the framework (called on every RPC), but reads c.invokers without any lock. SetInvokers writes with mutex.Lock(). Slice header tear (reading partially-updated data/len/cap) can cause out-of-bounds panic or routing to wrong nodes.
Fix: Add c.mutex.RLock() in Route(), snapshot the slice before iteration.
Category 4: config/service.go returns internal map references (P0)
File: config/service.go:91-93, 102-104
func GetProviderServiceMap() map[string]common.RPCService {
return proServices // returns internal map without lock or copy
}
Caller provider_config.go:171 iterates the returned map with for range, while SetProviderService (L54-62, holding proServicesLock) writes to the same map concurrently. No lock held during iteration = crash.
Fix: Return a copy under lock.
Category 5: RegistryDirectory fields accessed without locks (P1)
Multiple fields in RegistryDirectory are accessed concurrently without proper synchronization:
IsAvailable() reads cacheInvokers without invokersLock (L534-546) — setNewInvokers() writes with lock, but IsAvailable() reads without lock
Destroy() reads/writes cacheInvokers without invokersLock (L549-573)
configurators slice appended in convertUrl() (L379) without lock, read in overrideUrl() (L575-579) without lock
SubscribedUrl written in goroutine (L165) without lock, read in Destroy() (L559)
Category 6: Other notable races (P1-P2)
| Issue |
File |
Description |
rootConfig global |
config/config_loader.go:37,56 |
Written by Load()/SetRootConfig(), read by GetRootConfig() everywhere, no atomic/lock |
overrideSubscribeListener.configurator |
registry/protocol/protocol.go:338-394 |
Written in Notify() goroutine, read in doOverrideIfNecessary(), no lock |
hystrix Filter.res map |
filter/hystrix/filter.go:130-178 |
Written under configLoadMutex.Lock(), read in hystrix.Do callback without lock |
applicationName global |
metrics/common.go:46-63 |
Written by InitAppInfo(), read by metrics goroutine, no atomic |
versionInt map × 2 |
protocol/dubbo/impl/hessian.go:166, hessian2/hessian_response.go:405 |
Concurrent reads from multiple connection goroutines |
customizers slice |
extension/service_instance_customizer.go:28-41 |
AddCustomizers appends+sorts, GetCustomizers returns internal slice reference |
scheduledLookUp goroutine |
registry/nacos/registry.go:233-238 |
Uses time.Sleep without context, can't exit promptly on shutdown |
Subscribe timeout goroutine leak |
registry/directory/directory.go:162-211 |
Timeout returns error but goroutines continue running indefinitely |
Summary
| Category |
Count |
Severity |
| extension global maps (systematic) |
20+ maps across 22 files |
P0 |
Wrong lock (SetProviderService) |
1 |
P0 |
RouterChain.Route lockless |
1 |
P0 |
config/service.go internal map exposure |
2 maps |
P0 |
RegistryDirectory field races |
4 fields |
P1 |
| Other races |
10+ |
P1-P2 |
Reproduction
Any test exercising concurrent service registration + RPC invocation will trigger race detector reports.
Suggested Approach
- Introduce
extension.Registry[T] generic container — fixes 20+ files in one refactor
- Fix
dubbo.go:301 wrong lock — one-line fix, highest ROI
- Add
RLock to RouterChain.Route — critical hot path
- Return map copies from
config/service.go — straightforward
- Add locks to
RegistryDirectory field accesses — systematic review needed
Verification Checklist
Go Version
1.24
Dubbo-go Version
v3.3.1
OS
Linux / All platforms
Bug Description
A comprehensive audit of the dubbo-go codebase reveals 44 data race conditions across the framework. The most critical is a systematic pattern in the
common/extension/package where 20+ global maps are accessed without any synchronization, plus a confirmed bug whereSetProviderServiceuses the wrong lock variable. These issues will causefatal error: concurrent map read and map writecrashes in production.Running
go test -raceon any test that exercises dynamic service registration/deregistration, configuration updates, or concurrent RPC calls will reproduce these races.Category 1:
common/extension/— 20+ global maps with zero lock protection (P0)Every file in
common/extension/follows the exact same unsafe pattern:Affected files (all in
common/extension/):filter.gofilters,rejectedExecutionHandlerUnregisterFilter+GetAllFilterNames(runtime mutation)protocol.goprotocolsregistry.goregistriescluster.goclustersloadbalance.goloadbalancesUnregisterLoadbalance(runtime mutation)configurator.goconfiguratorconfig_center.goconfigCentersconfig_center_factory.goconfigCenterFactoriesconfig.goconfigsmetadata_report_factory.gometaDataReportFactoriesproxy_factory.goproxyFactoriestps_limit.gotpsLimitStrategy,tpsLimiterauth.goauthenticators,accessKeyStoragesrest_client.gorestClientsrest_server.gorestServersconfig_reader.goconfigReaders,defaultsGetDefaultConfigReaderreturns internal map reference!config_post_processor.goprocessorsrouter_factory.goroutersGetRouterFactoriesreturns internal map reference!registry_directory.godirectories,defaultDirectoryservice_discovery.godiscoveryCreatorMapservice_instance_selector_factory.goserviceInstanceSelectorMappingslogger.gologsotel_trace.gotraceExporterMapWorst cases:
GetDefaultConfigReader()andGetRouterFactories()return the internal map reference directly — callers iterate the returned map without any lock, whileSet*functions write to the same map concurrently. This is guaranteed to crash.Recommended fix: Introduce a generic
Registry[T]container with built-insync.RWMutex:This would fix all 20+ files with a single refactor.
Category 2:
dubbo.goSetProviderService uses wrong lock (P0)File:
dubbo.go:300-306SetProviderServiceusesconLock(consumer lock) to protect writes toproviderServices, butloadProvider()(L219-220) usesproLock.RLock()to protect reads of the same map. Two different locks protecting the same map = no protection at all. ConcurrentSetProviderServiceandloadProviderwill crash.Also:
GetConsumerConnection(L308-310) readsconsumerServiceswithout holdingconLockat all.Fix: Change L301
conLock.Lock()→proLock.Lock(). AddconLock.RLock()toGetConsumerConnection.Category 3:
RouterChain.Routereads invokers without lock (P0)File:
cluster/router/chain/chain.go:55-67, 91-98Route()is the hottest path in the framework (called on every RPC), but readsc.invokerswithout any lock.SetInvokerswrites withmutex.Lock(). Slice header tear (reading partially-updated data/len/cap) can cause out-of-bounds panic or routing to wrong nodes.Fix: Add
c.mutex.RLock()inRoute(), snapshot the slice before iteration.Category 4:
config/service.goreturns internal map references (P0)File:
config/service.go:91-93, 102-104Caller
provider_config.go:171iterates the returned map withfor range, whileSetProviderService(L54-62, holdingproServicesLock) writes to the same map concurrently. No lock held during iteration = crash.Fix: Return a copy under lock.
Category 5:
RegistryDirectoryfields accessed without locks (P1)Multiple fields in
RegistryDirectoryare accessed concurrently without proper synchronization:IsAvailable()readscacheInvokerswithoutinvokersLock(L534-546) —setNewInvokers()writes with lock, butIsAvailable()reads without lockDestroy()reads/writescacheInvokerswithoutinvokersLock(L549-573)configuratorsslice appended inconvertUrl()(L379) without lock, read inoverrideUrl()(L575-579) without lockSubscribedUrlwritten in goroutine (L165) without lock, read inDestroy()(L559)Category 6: Other notable races (P1-P2)
rootConfigglobalconfig/config_loader.go:37,56Load()/SetRootConfig(), read byGetRootConfig()everywhere, no atomic/lockoverrideSubscribeListener.configuratorregistry/protocol/protocol.go:338-394Notify()goroutine, read indoOverrideIfNecessary(), no lockhystrix Filter.resmapfilter/hystrix/filter.go:130-178configLoadMutex.Lock(), read inhystrix.Docallback without lockapplicationNameglobalmetrics/common.go:46-63InitAppInfo(), read by metrics goroutine, no atomicversionIntmap × 2protocol/dubbo/impl/hessian.go:166,hessian2/hessian_response.go:405customizerssliceextension/service_instance_customizer.go:28-41AddCustomizersappends+sorts,GetCustomizersreturns internal slice referencescheduledLookUpgoroutineregistry/nacos/registry.go:233-238time.Sleepwithout context, can't exit promptly on shutdownSubscribetimeout goroutine leakregistry/directory/directory.go:162-211Summary
SetProviderService)RouterChain.Routelocklessconfig/service.gointernal map exposureRegistryDirectoryfield racesReproduction
go test -race ./...Any test exercising concurrent service registration + RPC invocation will trigger race detector reports.
Suggested Approach
extension.Registry[T]generic container — fixes 20+ files in one refactordubbo.go:301wrong lock — one-line fix, highest ROIRLocktoRouterChain.Route— critical hot pathconfig/service.go— straightforwardRegistryDirectoryfield accesses — systematic review needed