Skip to content

fixed mem allocation in main#3

Open
vkolodie wants to merge 1 commit intopavanbalaji:wip/pmifrom
vkolodie:mem_alloc_in_proxy
Open

fixed mem allocation in main#3
vkolodie wants to merge 1 commit intopavanbalaji:wip/pmifrom
vkolodie:mem_alloc_in_proxy

Conversation

@vkolodie
Copy link
Copy Markdown

@vkolodie vkolodie commented May 5, 2017

Fixed place for mallocs, where proxy.num_children is known and changed the number of elements to proxy_params.immediate.proxy.num_children + 1 since "0" element is used for calling proxy itself.

@raffenet
Copy link
Copy Markdown

raffenet commented Aug 8, 2017

Applied to the wip/pmi branch. This can be closed.

pavanbalaji pushed a commit that referenced this pull request Oct 22, 2017
 Calculate and allocate the number of iovecs that will be used
 by the iovec state machine to generate the put/get/accumulate lists.

 Currently, we are setting the FI_ASYNC_IOV mode bit, which tells libfabric
 that we (netmod) will provide the storage for all the iovec operations.
 However, we currently do not provide that storage, and some providers
 appear to be providing the storage for iovecs anyways, so we have a
 "double bug" that makes this problem rarely manifest.

 There are a couple ways to fix this:
 1) Disable FI_ASYNC_IOV.  Probably not an optimal option.  We want to avoid
    internal allocations if possible.  If allocations are required anywhere
    in the stack, probably the best place is in the netmod because
    we have the highest level of datatype metadata.  I think what we really
    want to ask libfabric is if a memory copy of the iovec is required.
    In this case, we can allocate it.  If the hardware is going to copy the
    iovec as part of the command, we can use a "per-op" allocation and bypass
    this calculation entirely.
 2) Use a "per-op" allocation of the iovec, and use fi_context to complete.
    This will probably be slow, as injection should be message rate bound
    on fast networks.  By using completions instead of counters, we avoid
    allocation/free of per element storage and the associated overhead.
    The current scheme only uses 1 allocation.
 3) Allocate the required iovecs up front.

 This implements #3, but with some optimizations.  To count the total iovecs
 that will be required, some estimates are used:
 * We scan two elements from each source datatype and
   extraplate the total iovec count from that.
 * We sum the iovec lists as an upper bound, rather than calculate the
   exact use.  We'll calculate this in an exact manner when we replace
   the existing iovec expansion with direct processing of the datatype.
   We should only be using iovecs we actually touch, so in practice
   this shouldn't be a problem.

Fixes csr/mpich-opa#409

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants