fixed mem allocation in main#3
Open
vkolodie wants to merge 1 commit intopavanbalaji:wip/pmifrom
Open
Conversation
|
Applied to the wip/pmi branch. This can be closed. |
pavanbalaji
pushed a commit
that referenced
this pull request
Oct 22, 2017
Calculate and allocate the number of iovecs that will be used
by the iovec state machine to generate the put/get/accumulate lists.
Currently, we are setting the FI_ASYNC_IOV mode bit, which tells libfabric
that we (netmod) will provide the storage for all the iovec operations.
However, we currently do not provide that storage, and some providers
appear to be providing the storage for iovecs anyways, so we have a
"double bug" that makes this problem rarely manifest.
There are a couple ways to fix this:
1) Disable FI_ASYNC_IOV. Probably not an optimal option. We want to avoid
internal allocations if possible. If allocations are required anywhere
in the stack, probably the best place is in the netmod because
we have the highest level of datatype metadata. I think what we really
want to ask libfabric is if a memory copy of the iovec is required.
In this case, we can allocate it. If the hardware is going to copy the
iovec as part of the command, we can use a "per-op" allocation and bypass
this calculation entirely.
2) Use a "per-op" allocation of the iovec, and use fi_context to complete.
This will probably be slow, as injection should be message rate bound
on fast networks. By using completions instead of counters, we avoid
allocation/free of per element storage and the associated overhead.
The current scheme only uses 1 allocation.
3) Allocate the required iovecs up front.
This implements #3, but with some optimizations. To count the total iovecs
that will be required, some estimates are used:
* We scan two elements from each source datatype and
extraplate the total iovec count from that.
* We sum the iovec lists as an upper bound, rather than calculate the
exact use. We'll calculate this in an exact manner when we replace
the existing iovec expansion with direct processing of the datatype.
We should only be using iovecs we actually touch, so in practice
this shouldn't be a problem.
Fixes csr/mpich-opa#409
Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixed place for mallocs, where proxy.num_children is known and changed the number of elements to proxy_params.immediate.proxy.num_children + 1 since "0" element is used for calling proxy itself.