Skip to content

add-node workflow: fix recovery and wal restore issues, improve clean up logic#309

Merged
mmols merged 20 commits into
REL25_01from
add-node-bugs
May 15, 2025
Merged

add-node workflow: fix recovery and wal restore issues, improve clean up logic#309
mmols merged 20 commits into
REL25_01from
add-node-bugs

Conversation

@mmols
Copy link
Copy Markdown
Member

@mmols mmols commented May 14, 2025

  • Resolved an issue where recovery.signal would remain after the add-node process completed by setting --type=standby in the pgBackRest restore
  • Fixed an issue where cipher-type would not be set in the restore_command due to how pgBackRest generates the command, resulting in the inability to restore WAL from the repository (if needed)
  • Refactored add-node process to cleanly refer to source and target node
  • Simplified pgbackrest reconfguration logic for the source / target nodes
  • Added a backrest cleanup-replica command that ensures the newly added node does not have lingering configuration from the add node process
  • Ensured the restore_command is unset on the target node after the add node process completes
  • Fixed an issue where the pgBackRest yaml was rewritten to all nodes during add-node. It will now only be written to the target node during reconfiguration if pgBackRest should be configured on the target node
  • Removed logic that set S3 environment variables on the target node. This is documented as a prerequisite and will be handled by the user
  • Removed unused functions in backrest.py
  • Fixed an issue where repo-type S3 would not have it's repository configured
  • Fixed an issue where cipher-type=none was not respected
  • Fixed an issue where ssl_cert_file, ssl_key_file, and log_directory would not be properly configured on the target node
  • Removed code that set shared_preload_libraries on the target node - this is already set during the restore
  • Fixed an issue where add-node would use public ips for replication connections, even if private ips were provided
  • Add polling logic to verify postgres comes out of recovery during add node

Resolves PLAT-53, PLAT-34, PLAT-45

@mmols mmols requested a review from moizpgedge May 14, 2025 19:37
Copy link
Copy Markdown
Contributor

@moizpgedge moizpgedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to be merged

@mmols mmols merged commit e11277c into REL25_01 May 15, 2025
9 checks passed
@mmols mmols deleted the add-node-bugs branch May 15, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants