Yet Another eXchange Tool 0.11.2
|
YAXT relies on some defaults that usually provide an adequate compromise of space and time efficiency. In cases when these fail to provide, one can either override the global default with an environment variable or use a custom constructor method that takes an extra argument of type Xt_config for exactly the place where the usual default fails.
The internal exchanger class handles the message passing part of the redist.
The default exchanger, mix_irecv_isend, is suited for MPI implementations with strong support for derived MPI datatypes and efficient handling of nonblocking point-to-point communication.
By setting the environment variable XT_CONFIG_DEFAULT_EXCHANGE_METHOD, one of the other available exchangers can be selected:
irecv_send uses non-blocking receives but then uses blocking sends. This may be beneficial for platforms that can elide some preparations for non-blocking transfers in this case but is expected to have downsides as soon as actual network latencies become relevant.
irecv_isend differs from the default in that it doesn't bother to mix initiating sends and receives. Rather, all receives are initiated before any send is initiated.
irecv_isend_packed Separates the network transfer from the iteration of the MPI datatype by creating an MPI_PACKED buffer filled via MPI_Pack internally.
irecv_isend_ddt_packed Builds on irecv_isend_packed but uses OpenACC kernels instead of MPI_Pack. This usually results in significant performance gains on GPUs.
mix_irecv_isend
neigh_alltoall Available when a robust implementation of MPI 3 neighbor collectives is provided. Uses MPI_Neighbor_alltoallw instead of point-to-point communication. This may benefit from pre-created data paths but creates additional MPI communicators which may be too costly in highly dynamic use cases.
to stripes
When constructing an Xmap via xt_xmap_dist_dir_new or xt_xmap_all2all_new, index lists passed in as index vector will be automatically converted to index stripes if the size is above a limit which defaults to 128 indices.
The XT_CONFIG_DEFAULT_IDXVEC_AUTOCONVERT_SIZE environment variable can be used to override this value.
For an MPI that supports multiple threads calling into it (see MPI_Init_thread and MPI_THREAD_MULTIPLE), it can be more efficient to use all multiple send and/or receive operations in parallel.
Set the XT_CONFIG_DEFAULT_MULTI_THREAD_MODE environment variable to "XT_MT_OPENMP" to enable this globally. This is currently equivalent to adding an Xt_config parameter which had xt_config_set_redist_mthread_mode with parameter XT_MT_OPENMP called on it.
The default sort algorithm in YAXT is currenlty quicksort because it does not normally need more than memory. It is used to speed up the computation of index list intersections when the inputs are index vectors. Since almost sorted inputs lead to exaggerated run-time resuls, it can be beneficial to use the Mergesort algorithm on such index lists. Call xt_config_set_sort_algorithm_by_id on an object of class Xt_config and use it in calls to xmap constructors to change the sort algorithm.
By setting the environment variable XT_CONFIG_DEFAULT_SORT_ALGORITHM to "mergesort" (case-insensitive) this can be made the global default.
When computing the positions corresponding to an intersection during the construction of an Xmap, the usual strategy for index lists above a certain size (the limit described at Automatic conversions of index vectors ) is to map multiple positions at the same time by describing them as position extents (Xt_pos_ext).
In case only very few or even only one adjacent positions are ever mapped together this is very inefficient. In such situations the alternative strategy of mapping each index individually is faster. Mapping individually can be set with xt_config_set_xmap_stripe_align and an object of type Xt_config. Also this can be requested globally be setting the environment variable XT_CONFIG_DEFAULT_XMAP_STRIPE_ALIGN to "one_by_one" or the value 0. The default is "auto" or 2 and stripe alignment can be enforced with the values "always" or 1.
In some situations, the default to use extra data structures to speed up e.g. searching can be counter-productive. In those cases the default can be changed to prefer expensive but memory-conserving algorithms.
This can be changed globally by setting the environment variable XT_CONFIG_DEFAULT_MEM_SAVING to 1.