Description
Several cogs have potential race conditions where multiple async operations could modify the same configuration data concurrently, leading to data loss or inconsistent state.
Background
Red-bot's Config system uses context managers (async with) to read and write configuration data. However, if multiple tasks/commands run concurrently and modify the same config, updates can be lost because each task:
- Reads the current config
- Modifies it in memory
- Writes it back
If two tasks do steps 1-2 concurrently, the second write will overwrite the first task's changes.
Locations
1. party/party.py - Lines 288-290
Current Code:
async with self.config.guild(ctx.guild).parties() as parties:
parties[party_id]["message_id"] = message.id
parties[party_id]["channel_id"] = ctx.channel.id
Problem: If two users create parties at the exact same time, their updates could conflict.
2. secret_santa/secret_santa.py - Lines 1405-1408
Current Code:
async with self.config.guild_from_id(guild_id).events() as events:
events[event_name]["participants"][user_id_str]["wishlist"] = wishlist
Problem: If multiple users update their wishlists simultaneously, one update could be lost.
3. quotesdb/quotedb.py - Adding/removing quotes concurrently
Problem: Multiple users adding quotes to the same trigger at the same time could result in lost quotes.
4. assign_roles/assign_roles.py - Line 111
Current Code:
server_dict.setdefault(giveable_id, []).append(authorised_id)
Problem: No duplicate check before append. Multiple concurrent authorizations could add the same role multiple times.
Impact
- Data loss: Concurrent updates can overwrite each other
- Inconsistent state: Configuration can become corrupted
- Rare but serious: Only happens with simultaneous operations, making it hard to detect
- Silent failures: No error messages, data just disappears
Example Scenario
Time | User A | User B
------|--------------------------------|--------------------------------
T1 | Read parties config |
T2 | Modify parties (add Party A) | Read parties config
T3 | Write parties back | Modify parties (add Party B)
T4 | | Write parties back ← Overwrites A's changes!
Result: Party A is lost!
Potential Solutions
Option 1: Use finer-grained config keys
Instead of:
async with self.config.guild(guild).parties() as parties:
parties[party_id] = new_party
Use individual keys:
# Each party is a separate config key
party_config = self.config.guild(guild).party(party_id)
await party_config.message_id.set(message.id)
await party_config.channel_id.set(ctx.channel.id)
Pros: Each party has its own lock, preventing conflicts
Cons: Requires config schema changes
Option 2: Use application-level locking
import asyncio
class PartyCog(commands.Cog):
def __init__(self, bot):
self.bot = bot
self.config_locks = {} # guild_id -> Lock
def get_guild_lock(self, guild_id):
if guild_id not in self.config_locks:
self.config_locks[guild_id] = asyncio.Lock()
return self.config_locks[guild_id]
async def create_party(self, ctx, ...):
lock = self.get_guild_lock(ctx.guild.id)
async with lock:
async with self.config.guild(ctx.guild).parties() as parties:
parties[party_id] = new_party
Pros: Simple to implement, minimal config changes
Cons: Adds lock contention, could slow down operations
Option 3: Retry with optimistic locking
async def update_party_with_retry(self, guild, party_id, update_func, max_retries=3):
for attempt in range(max_retries):
async with self.config.guild(guild).parties() as parties:
if party_id not in parties:
raise ValueError("Party not found")
old_value = parties[party_id].copy()
new_value = update_func(old_value)
# Check if value changed during our update
current_value = await self.config.guild(guild).parties.get_raw(party_id)
if current_value == old_value:
parties[party_id] = new_value
return
await asyncio.sleep(0.1 * (2 ** attempt)) # Exponential backoff
raise RuntimeError("Failed to update after retries")
Pros: No locking needed, handles conflicts automatically
Cons: More complex, may retry unnecessarily
Recommended Approach
For this repository, Option 2 (application-level locking) is recommended because:
- Simple to implement
- Minimal changes to existing code
- Works with current config schema
- Prevents data loss with minimal performance impact
Priority
MEDIUM - Rare but serious issue that could cause data loss.
Testing
To test race conditions (difficult to reproduce):
- Write a script that creates multiple parties simultaneously
- Verify all parties are saved correctly
- Check for any lost or corrupted data
- Monitor logs for warnings
Implementation Note
Start with the most critical cogs:
party - Most likely to have concurrent access
secret_santa - Important for event management
quotesdb - High-traffic commands
Related
This is part of a comprehensive code review. See other issues for additional improvements.
Description
Several cogs have potential race conditions where multiple async operations could modify the same configuration data concurrently, leading to data loss or inconsistent state.
Background
Red-bot's Config system uses context managers (
async with) to read and write configuration data. However, if multiple tasks/commands run concurrently and modify the same config, updates can be lost because each task:If two tasks do steps 1-2 concurrently, the second write will overwrite the first task's changes.
Locations
1.
party/party.py- Lines 288-290Current Code:
Problem: If two users create parties at the exact same time, their updates could conflict.
2.
secret_santa/secret_santa.py- Lines 1405-1408Current Code:
Problem: If multiple users update their wishlists simultaneously, one update could be lost.
3.
quotesdb/quotedb.py- Adding/removing quotes concurrentlyProblem: Multiple users adding quotes to the same trigger at the same time could result in lost quotes.
4.
assign_roles/assign_roles.py- Line 111Current Code:
Problem: No duplicate check before append. Multiple concurrent authorizations could add the same role multiple times.
Impact
Example Scenario
Result: Party A is lost!
Potential Solutions
Option 1: Use finer-grained config keys
Instead of:
Use individual keys:
Pros: Each party has its own lock, preventing conflicts
Cons: Requires config schema changes
Option 2: Use application-level locking
Pros: Simple to implement, minimal config changes
Cons: Adds lock contention, could slow down operations
Option 3: Retry with optimistic locking
Pros: No locking needed, handles conflicts automatically
Cons: More complex, may retry unnecessarily
Recommended Approach
For this repository, Option 2 (application-level locking) is recommended because:
Priority
MEDIUM - Rare but serious issue that could cause data loss.
Testing
To test race conditions (difficult to reproduce):
Implementation Note
Start with the most critical cogs:
party- Most likely to have concurrent accesssecret_santa- Important for event managementquotesdb- High-traffic commandsRelated
This is part of a comprehensive code review. See other issues for additional improvements.