[cleaner] Better distribution of files among child cleaner processes#4228
[cleaner] Better distribution of files among child cleaner processes#4228pmoravec wants to merge 1 commit intososreport:mainfrom
Conversation
|
Performance comparison (on one small testing sosreport, with default 4 workers/jobs):
For bigger sosreports, I assume smaller, yet evident improvement (I plan to share some stats once my performance tests complete). |
|
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
c73ddb5 to
4bba85f
Compare
|
Some tests are failing in a weird way, moving PR to draft to investigate that. |
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
4bba85f to
2734824
Compare
|
Any idea why "Report Stage One - debian-13" repeatedly fails on |
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
61d5620 to
0ab49f9
Compare
|
Kindly asking for a review anybody from the list of people I marked as reviewers. This PR prevents cases when whole cleaner waits a long time for one child process that is busy with its remaining work, while the other child processes are done already. Two improvements made:
This requires using |
| try: | ||
| rel_name = os.path.relpath(filename, start=self.extracted_path) | ||
| if self.should_skip_file(rel_name): | ||
| # TODO: add to some/all returns a debug? |
There was a problem hiding this comment.
+1 to add debug info, it could be very useful. But perhaps, at least in this case, it could be added directly inside should_skip_file(), where it looks like we don't log the skip either
There was a problem hiding this comment.
Hmm.. that method is an auxiliary one and those usually dont have logs.. but as it is called just from here and sometimes it can be beneficial to know the reason for the skip ("not a file or symlink, skipping" != "filename in skip list, skipping"), it makes sense.
There was a problem hiding this comment.
Honestly, either way, inside the function or in the 'if's that you have here, I don't have a strong preference. But for sure these messages would be very useful
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
0ab49f9 to
22e2408
Compare
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
22e2408 to
0cb1f6f
Compare
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits. Using multiprocessing.[Process|Queue] allow to: 1) order files per sizes, starting from biggest (with the assumption of longest cleanup time) 2) dynamic fetching of new file by a child process, only when it gets idle Relevant: sosreport#4227 Closes: sosreport#4228 Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
0cb1f6f to
e1aae8c
Compare
|
Kindly asking for a PR review here. Feedback about exceptions is incorporated, this alone improves performance by several percents and namely it makes the cleaner runtime more deterministic. |
|
Kindly asking @TurboTurtle and/or @arif-ali for a review. |
Static split of input files often meant uneven distribution of load among the child processes. That caused just one process being busy at the end for many minutes, which ruins concurrency benefits.
Using multiprocessing.[Process|Queue] allow to:
longest cleanup time)
idle
Relevant: #4227
Closes: #4228
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines