-
Notifications
You must be signed in to change notification settings - Fork 198
Improve performance by reducing redundant dictionary fetches. #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -285,6 +285,7 @@ def __init__(self, *args, **kwargs): | |
|
|
||
| # TODO: insert takes 30% of the overall CPU time, mostly in def key() | ||
| # with about 15% of the overall CPU time | ||
| # NOTE: Likely improved by reducing calls to def key() on Jan 2026 (PR #336). | ||
| def insert(self, key, geno=None, iteration=None, fitness=None, | ||
| value=None, cma_norm=None): | ||
| """insert an entry with key ``key`` and value | ||
|
|
@@ -308,29 +309,26 @@ def insert(self, key, geno=None, iteration=None, fitness=None, | |
|
|
||
| self.last_solution_index += 1 | ||
| if value is not None: | ||
| try: | ||
| iteration = value['iteration'] | ||
| except: | ||
| pass | ||
| iteration = value.get('iteration', iteration) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is wrong with the original code here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Throwing an exception is quiet expensive, and there is no need for it here. Even if this is not a problem right now it can unexpectedly become one if we run into this path more often. Also we don't check which exception was thrown, so it can be anything. If the user pressed ctrl+c or any other interrupt comes in at this line the program would silently ignore it, and in addition start to show undefined / unexpected behavior from this point on because the iteration variable was not updated, even though it should have been. Finally I think the second version is much more expressive, as it basically says, take 'iteration' from value with the old iteration-variable as default.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, the exception should be caught more tightly and not in this a lazy manner. Otherwise I am not sure about your arguments: "Easier to Ask Forgiveness than Permission (EAFP) is the recommended Python practice over checking conditions in advance (LBYL – Look Before You Leap)." Doesn't really matter here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, Look Before you leap would be something like and yes, I agree, there are good reasons to avoid this type of code (for example because concurrency can create unexpected behavior). But the get-function is exactly there so that we don't have to surround every fetch of a dictionary with a try-except block. I also agree with preferring EAFP over LBYL, but this isn’t really a choice between those two idioms, and preferring EAFP over LBYL doesn't mean that we should use EAFP for its own sake.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the main reason and use case for the |
||
| if iteration is not None: | ||
| if iteration > self.last_iteration: | ||
| self.last_solution_index = 0 | ||
| self.last_iteration = iteration | ||
| else: | ||
| iteration = self.last_iteration + 0.5 # a hack to get a somewhat reasonable value | ||
| if value is not None: | ||
| self[key] = value | ||
| entry = value | ||
| else: | ||
| self[key] = {'pheno': key} | ||
| entry = {'pheno': key} | ||
| if geno is not None: | ||
| self[key]['geno'] = geno | ||
| if iteration is not None: | ||
| self[key]['iteration'] = iteration | ||
| entry['geno'] = geno | ||
| entry['iteration'] = iteration | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't look correct to me, as it doesn't check for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shortly before we already perform this check and define it as last_iteration + 0.5 if necessary, so to my understanding it is impossible that it will be None. Maybe the right way to go would be to place an assert + comment to point this out? But removing the if statement indeed was just a cleanup because I thought it makes things more clearly if we know that we will always have the iteration variable and doesn't have any significant performance benefits. |
||
| if fitness is not None: | ||
| self[key]['fitness'] = fitness | ||
| entry['fitness'] = fitness | ||
| if cma_norm is not None: | ||
| self[key]['cma_norm'] = cma_norm | ||
| return self[key] | ||
| entry['cma_norm'] = cma_norm | ||
| self[key] = entry | ||
| return entry | ||
|
|
||
| class _CMASolutionDict_empty(dict): | ||
| """a hack to get most code examples running""" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was removed, but why? How do we know insert is not expensive anymore?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.
I think for my use case the two changes that I made reduced the running time of my whole program by around 30 %. Before the change the insert function took a big part of the time of the overall runtime which is not the case anymore, and it doesn't seem to me that improving it further is worth the effort.
I tried to make a specific test, but the numbers in the end seemed to vary with the type of key etc, and I'm not sure how to reproduce the exact numbers of the comment.
Sorry that I can't give a better answer. If there is any test that I should do I'm happy to run it and write the numbers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are assuming that fetching is the expensive part, which, I guess, you only did in the first place because you saw the comment you deleted?
Maybe make a comment on the todo instead of removing it, as you seem not to know for sure whether your changes actually addressed it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not really. I plotted a callgraph which guided me to the key function and from there back to the insert function. I only realized afterwards, that the comment was about this problem. I don't have the original files anymore, but I just ran a small benchmark with both versions again.
So here the original version:

And here the modified version:

Three points to consider about the plot:
I don't now how much overhead the profiling itself added.
I pruned everything that is smaller then 2% of overall runtime, so not all Vertices and Edges are shown, but the numbers itself are unaffected.
The original version had a total runtime of 1.477 s the modified version 1.313 s. For correctness we have to multiply all numbers in the graph of the modified run with a factor of 0.89, because all numbers are relative to the total runtime.
In the first version getitem was called three times for each insert, each time resulting in a call to the key-function. With the modification the calls of getItem reduced to zero. There is still a call of the key function through setItem in both versions (even though the edge is not displayed in the original graph because the connection was below 2%).
The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).
The total speedup seems to be much smaller than I originally thought (only 10% in this test). I have to confess that I didn't do extensive tests on the whole program, I just ran it again, so maybe there was some background task in the first run or anything else making me believe that the change improved more then it actually did. Still, I would see this comment as addressed.
Does this all make sense or am I overlooking something? I think you can also just edit my pull request if you have a specific comment in mind that you would like to have placed there.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I how do you do these graphs?
OK, true, on the other hand, first case:
keyinutilstakes cumulated 12.84%. Second case: 8.47%. This matches with the overall reduction of number of calls tokey. Doesn't that mean that reducing the number of calls tokeywas just about moderately effective overall? I would say so, which means, the part of the comment that sayskeyis expensive remains valid?I don't know whether it is addressed, in particular because we don't know the setup under which the original comment was made (I suspect with large population size and moderate to large dimension). Also, the comment suggests that (only) half of the time is spent in
keywhich suggests that the modification should speed up the function by less than a factor of two? In such a case, where I don't quite know, I won't see it as addressed until I have better confirmation.Don't get me wrong, I think the code modification is solid and justified, there nothing wrong with that.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used cProfile with gprof2dot.
python -m cProfile -o output.prof benchmark.py
python -m gprof2dot -f pstats --root "benchmark:17:run_benchmark" -n 2 -e 2 output.prof > output.dot
Let me know if you have any questions.
Everything you write sounds correct to me. I just not sure what we could really write instead. I think that 3.5% for the insert function of overall runtime is probably just fine. Leaving the comment as it is might send someone else to search for something that they just can't find. I don't really know what I can do to find out if it is resolved on top of what I did so far as I can't reproduce original the numbers (of course I'm more a user of this library than a developer :D). But the question is a bit will someone else in the future be able to do that? If not than the comment might stay forever.
As said if you have anything for that comment in mind I'm happy to edit it in that way.
The key function is potentially something to work on, but I didn't see any easy fix that doesn't cause bigger structural changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave the original comment and add a comment on it below that the insert function was change to (try to) address this by removing redundant dictionary accesses with a date when this was done (like Jan 2026).