Skip to content

[BUG] Systematic data races: 20+ extension global maps unprotected, wrong lock in SetProviderService, RouterChain.Route lockless #3247

@AlexStocks

Description

@AlexStocks

Verification Checklist

Go Version

1.24

Dubbo-go Version

v3.3.1

OS

Linux / All platforms

Bug Description

A comprehensive audit of the dubbo-go codebase reveals 44 data race conditions across the framework. The most critical is a systematic pattern in the common/extension/ package where 20+ global maps are accessed without any synchronization, plus a confirmed bug where SetProviderService uses the wrong lock variable. These issues will cause fatal error: concurrent map read and map write crashes in production.

Running go test -race on any test that exercises dynamic service registration/deregistration, configuration updates, or concurrent RPC calls will reproduce these races.


Category 1: common/extension/ — 20+ global maps with zero lock protection (P0)

Every file in common/extension/ follows the exact same unsafe pattern:

var things = make(map[string]func() Thing)

func SetThing(name string, v func() Thing) {
    things[name] = v  // write: no lock
}

func GetThing(name string) Thing {
    if things[name] == nil {  // read: no lock
        panic("...")
    }
    return things[name]()
}

Affected files (all in common/extension/):

File Global map(s) Extra risk
filter.go filters, rejectedExecutionHandler Has UnregisterFilter + GetAllFilterNames (runtime mutation)
protocol.go protocols Hot path: every invoker creation
registry.go registries
cluster.go clusters Hot path: invoker reorganization
loadbalance.go loadbalances Has UnregisterLoadbalance (runtime mutation)
configurator.go configurator Config center hot reload path
config_center.go configCenters
config_center_factory.go configCenterFactories
config.go configs
metadata_report_factory.go metaDataReportFactories
proxy_factory.go proxyFactories
tps_limit.go tpsLimitStrategy, tpsLimiter
auth.go authenticators, accessKeyStorages
rest_client.go restClients
rest_server.go restServers
config_reader.go configReaders, defaults GetDefaultConfigReader returns internal map reference!
config_post_processor.go processors
router_factory.go routers GetRouterFactories returns internal map reference!
registry_directory.go directories, defaultDirectory
service_discovery.go discoveryCreatorMap
service_instance_selector_factory.go serviceInstanceSelectorMappings
logger.go logs
otel_trace.go traceExporterMap Iterated in shutdown callback

Worst cases: GetDefaultConfigReader() and GetRouterFactories() return the internal map reference directly — callers iterate the returned map without any lock, while Set* functions write to the same map concurrently. This is guaranteed to crash.

Recommended fix: Introduce a generic Registry[T] container with built-in sync.RWMutex:

type Registry[T any] struct {
    mu    sync.RWMutex
    items map[string]T
}

func (r *Registry[T]) Set(name string, v T) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.items[name] = v
}

func (r *Registry[T]) Get(name string) (T, bool) {
    r.mu.RLock()
    defer r.mu.RUnlock()
    v, ok := r.items[name]
    return v, ok
}

func (r *Registry[T]) Snapshot() map[string]T {
    r.mu.RLock()
    defer r.mu.RUnlock()
    m := make(map[string]T, len(r.items))
    for k, v := range r.items {
        m[k] = v
    }
    return m
}

This would fix all 20+ files with a single refactor.


Category 2: dubbo.go SetProviderService uses wrong lock (P0)

File: dubbo.go:300-306

var (
    consumerServices = map[string]*client.ClientDefinition{}
    conLock          sync.RWMutex     // for consumerServices
    providerServices = map[string]*server.ServiceDefinition{}
    proLock          sync.RWMutex     // for providerServices
)

func SetProviderService(svc common.RPCService) {
    conLock.Lock()           // BUG: should be proLock!
    defer conLock.Unlock()
    providerServices[...] = ...  // writing providerServices with wrong lock
}

SetProviderService uses conLock (consumer lock) to protect writes to providerServices, but loadProvider() (L219-220) uses proLock.RLock() to protect reads of the same map. Two different locks protecting the same map = no protection at all. Concurrent SetProviderService and loadProvider will crash.

Also: GetConsumerConnection (L308-310) reads consumerServices without holding conLock at all.

Fix: Change L301 conLock.Lock()proLock.Lock(). Add conLock.RLock() to GetConsumerConnection.


Category 3: RouterChain.Route reads invokers without lock (P0)

File: cluster/router/chain/chain.go:55-67, 91-98

func (c *RouterChain) Route(...) []base.Invoker {
    finalInvokers := make([]base.Invoker, 0, len(c.invokers))  // read: NO LOCK
    for _, invoker := range c.invokers {                        // iterate: NO LOCK
        // ...
    }
}

func (c *RouterChain) SetInvokers(invokers []base.Invoker) {
    c.mutex.Lock()           // write: has lock
    defer c.mutex.Unlock()
    c.invokers = invokers
}

Route() is the hottest path in the framework (called on every RPC), but reads c.invokers without any lock. SetInvokers writes with mutex.Lock(). Slice header tear (reading partially-updated data/len/cap) can cause out-of-bounds panic or routing to wrong nodes.

Fix: Add c.mutex.RLock() in Route(), snapshot the slice before iteration.


Category 4: config/service.go returns internal map references (P0)

File: config/service.go:91-93, 102-104

func GetProviderServiceMap() map[string]common.RPCService {
    return proServices  // returns internal map without lock or copy
}

Caller provider_config.go:171 iterates the returned map with for range, while SetProviderService (L54-62, holding proServicesLock) writes to the same map concurrently. No lock held during iteration = crash.

Fix: Return a copy under lock.


Category 5: RegistryDirectory fields accessed without locks (P1)

Multiple fields in RegistryDirectory are accessed concurrently without proper synchronization:

  1. IsAvailable() reads cacheInvokers without invokersLock (L534-546) — setNewInvokers() writes with lock, but IsAvailable() reads without lock
  2. Destroy() reads/writes cacheInvokers without invokersLock (L549-573)
  3. configurators slice appended in convertUrl() (L379) without lock, read in overrideUrl() (L575-579) without lock
  4. SubscribedUrl written in goroutine (L165) without lock, read in Destroy() (L559)

Category 6: Other notable races (P1-P2)

Issue File Description
rootConfig global config/config_loader.go:37,56 Written by Load()/SetRootConfig(), read by GetRootConfig() everywhere, no atomic/lock
overrideSubscribeListener.configurator registry/protocol/protocol.go:338-394 Written in Notify() goroutine, read in doOverrideIfNecessary(), no lock
hystrix Filter.res map filter/hystrix/filter.go:130-178 Written under configLoadMutex.Lock(), read in hystrix.Do callback without lock
applicationName global metrics/common.go:46-63 Written by InitAppInfo(), read by metrics goroutine, no atomic
versionInt map × 2 protocol/dubbo/impl/hessian.go:166, hessian2/hessian_response.go:405 Concurrent reads from multiple connection goroutines
customizers slice extension/service_instance_customizer.go:28-41 AddCustomizers appends+sorts, GetCustomizers returns internal slice reference
scheduledLookUp goroutine registry/nacos/registry.go:233-238 Uses time.Sleep without context, can't exit promptly on shutdown
Subscribe timeout goroutine leak registry/directory/directory.go:162-211 Timeout returns error but goroutines continue running indefinitely

Summary

Category Count Severity
extension global maps (systematic) 20+ maps across 22 files P0
Wrong lock (SetProviderService) 1 P0
RouterChain.Route lockless 1 P0
config/service.go internal map exposure 2 maps P0
RegistryDirectory field races 4 fields P1
Other races 10+ P1-P2

Reproduction

go test -race ./...

Any test exercising concurrent service registration + RPC invocation will trigger race detector reports.

Suggested Approach

  1. Introduce extension.Registry[T] generic container — fixes 20+ files in one refactor
  2. Fix dubbo.go:301 wrong lock — one-line fix, highest ROI
  3. Add RLock to RouterChain.Route — critical hot path
  4. Return map copies from config/service.go — straightforward
  5. Add locks to RegistryDirectory field accesses — systematic review needed

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions