Skip to content

⚠️ Potential race conditions in concurrent config access #125

@anthropic-code-agent

Description

@anthropic-code-agent

Description

Several cogs have potential race conditions where multiple async operations could modify the same configuration data concurrently, leading to data loss or inconsistent state.

Background

Red-bot's Config system uses context managers (async with) to read and write configuration data. However, if multiple tasks/commands run concurrently and modify the same config, updates can be lost because each task:

  1. Reads the current config
  2. Modifies it in memory
  3. Writes it back

If two tasks do steps 1-2 concurrently, the second write will overwrite the first task's changes.

Locations

1. party/party.py - Lines 288-290

Current Code:

async with self.config.guild(ctx.guild).parties() as parties:
    parties[party_id]["message_id"] = message.id
    parties[party_id]["channel_id"] = ctx.channel.id

Problem: If two users create parties at the exact same time, their updates could conflict.

2. secret_santa/secret_santa.py - Lines 1405-1408

Current Code:

async with self.config.guild_from_id(guild_id).events() as events:
    events[event_name]["participants"][user_id_str]["wishlist"] = wishlist

Problem: If multiple users update their wishlists simultaneously, one update could be lost.

3. quotesdb/quotedb.py - Adding/removing quotes concurrently

Problem: Multiple users adding quotes to the same trigger at the same time could result in lost quotes.

4. assign_roles/assign_roles.py - Line 111

Current Code:

server_dict.setdefault(giveable_id, []).append(authorised_id)

Problem: No duplicate check before append. Multiple concurrent authorizations could add the same role multiple times.

Impact

  • Data loss: Concurrent updates can overwrite each other
  • Inconsistent state: Configuration can become corrupted
  • Rare but serious: Only happens with simultaneous operations, making it hard to detect
  • Silent failures: No error messages, data just disappears

Example Scenario

Time  | User A                          | User B
------|--------------------------------|--------------------------------
T1    | Read parties config            |
T2    | Modify parties (add Party A)   | Read parties config
T3    | Write parties back             | Modify parties (add Party B)
T4    |                                | Write parties back ← Overwrites A's changes!

Result: Party A is lost!

Potential Solutions

Option 1: Use finer-grained config keys

Instead of:

async with self.config.guild(guild).parties() as parties:
    parties[party_id] = new_party

Use individual keys:

# Each party is a separate config key
party_config = self.config.guild(guild).party(party_id)
await party_config.message_id.set(message.id)
await party_config.channel_id.set(ctx.channel.id)

Pros: Each party has its own lock, preventing conflicts
Cons: Requires config schema changes

Option 2: Use application-level locking

import asyncio

class PartyCog(commands.Cog):
    def __init__(self, bot):
        self.bot = bot
        self.config_locks = {}  # guild_id -> Lock
    
    def get_guild_lock(self, guild_id):
        if guild_id not in self.config_locks:
            self.config_locks[guild_id] = asyncio.Lock()
        return self.config_locks[guild_id]
    
    async def create_party(self, ctx, ...):
        lock = self.get_guild_lock(ctx.guild.id)
        async with lock:
            async with self.config.guild(ctx.guild).parties() as parties:
                parties[party_id] = new_party

Pros: Simple to implement, minimal config changes
Cons: Adds lock contention, could slow down operations

Option 3: Retry with optimistic locking

async def update_party_with_retry(self, guild, party_id, update_func, max_retries=3):
    for attempt in range(max_retries):
        async with self.config.guild(guild).parties() as parties:
            if party_id not in parties:
                raise ValueError("Party not found")
            
            old_value = parties[party_id].copy()
            new_value = update_func(old_value)
            
            # Check if value changed during our update
            current_value = await self.config.guild(guild).parties.get_raw(party_id)
            if current_value == old_value:
                parties[party_id] = new_value
                return
        
        await asyncio.sleep(0.1 * (2 ** attempt))  # Exponential backoff
    
    raise RuntimeError("Failed to update after retries")

Pros: No locking needed, handles conflicts automatically
Cons: More complex, may retry unnecessarily

Recommended Approach

For this repository, Option 2 (application-level locking) is recommended because:

  • Simple to implement
  • Minimal changes to existing code
  • Works with current config schema
  • Prevents data loss with minimal performance impact

Priority

MEDIUM - Rare but serious issue that could cause data loss.

Testing

To test race conditions (difficult to reproduce):

  1. Write a script that creates multiple parties simultaneously
  2. Verify all parties are saved correctly
  3. Check for any lost or corrupted data
  4. Monitor logs for warnings

Implementation Note

Start with the most critical cogs:

  1. party - Most likely to have concurrent access
  2. secret_santa - Important for event management
  3. quotesdb - High-traffic commands

Related

This is part of a comprehensive code review. See other issues for additional improvements.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions