From 9f1e8d1cb732f6f21c787484e32df6267d832dad Mon Sep 17 00:00:00 2001 From: John Alsop Date: Mon, 26 Jan 2026 14:41:18 -0600 Subject: [PATCH 1/9] Add formal API-level memory consistency model Add a section formally defining memory ordering guaranteed by the OpenSHMEM API. This formalization section describes how a formal PL-level memory model like C++ can be extended to define the API-level memory semantics provided by OpenSHMEM. This adds clarity and reduces ambiguity about how OpenSHMEM interacts with the memory system, which is increasingly important as the API is implemented on accelerators with SW managed coherence --- content/api_mem_model.tex | 200 +++++++++++++++++ content/api_observable.tex | 436 +++++++++++++++++++++++++++++++++++++ figures/api_relations.png | Bin 0 -> 53238 bytes main_spec.tex | 3 + 4 files changed, 639 insertions(+) create mode 100644 content/api_mem_model.tex create mode 100644 content/api_observable.tex create mode 100644 figures/api_relations.png diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex new file mode 100644 index 000000000..3c79acdae --- /dev/null +++ b/content/api_mem_model.tex @@ -0,0 +1,200 @@ + +To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). + +Although \openshmem may be implemented in multiple programming languages, this section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality, and because it forms the basis for common GPU programming languages (CUDA, hip). We describe how the guarantees described here can be adapted for GPU memory models in Section~\ref{subsubsec:api_scoped}. + +\subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} + +A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial memory value at that address (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). + + + +The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. + +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}) perform both a read and write to memory, and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: + +\begin{itemize} + \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) + \item \textbf{Modification order (\textit{mo}):} A total ordering between all writes to the same address. + \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same address such that \textit{R} returns the value written by \textit{W}. + \item \textbf{Synchronizes-with (\textit{sw}):} Same as \textit{rf}, but also requires that \textit{W} and \textit{R} are atomic, \textit{W} has release or acquire-release memory order, and \textit{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). + \item \textbf{Happens-before (\textit{hb}):} Transitive closure of \textit{sw} and \textit{sb}. +\end{itemize} + +Furthermore, a \textbf{data race} is defined as a relation between two accesses $A$ and $B$ where all of the following are true: +\begin{itemize} + \item $A$ and $B$ access the same address. + \item At least one access is a write. + \item At least one access is not atomic. + \item $A$ and $B$ are not related by \textit{hb}. +\end{itemize} + +Although the axiomatic model places no requirements on when each event must physically update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C++ constrains candidate execution with the following axioms\footnote{When using SC memory order, there is an additional total order over all SC atomics that must similarly be consistent with happens-before}: + +\textbf{Axioms} + +\begin{itemize} + \item $acyclic(hb)$: \textit{hb} cannot have any cycles. + \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^-1) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. +\end{itemize} + + +Finally, the C++ specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. + +\subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} + +The \openshmem API-level memory model extends PL-level memory models like C++ to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. + +To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: + +\begin{enumerate} + \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new address property that specifies the target PE. + \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \textit{sb}). + \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. + \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. + \item\label{mcmsteps:five} \textbf{Updated relations, axioms}: Update PL-level axioms to incorporate new events and relations. +\end{enumerate} + +Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple PE memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across PEs even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. + +After defining how \openshmem extends the C++ memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. + +\subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} + +When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory address is extended to include both the PE-local address and the PE ID. Normal memory accesses can only access addresses on the PE local to the thread that issued it, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same address if both the address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). + +Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \textit{sb} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \textit{mo}, \textit{rf}, and \textit{sw}. + +\subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} + +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be "issued" by the associated. These accesses must occur as part of the operation, however there is no guarantee that they will be executed by the initiating thread, or that they will be the only accesses performed by the API operation implementation. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses and the initiating thread of an API operation. Accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. + +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the address as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which addresses are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: + \begin{itemize} + \item $B$ takes a value derived from $A$ as an operand. + \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is dependent on all prior observable accesses. + \end{itemize} +Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). + +\input{content/api_observable} + +\subsubsection{Fused Collectives}\label{subsubsec:api_integrate} + +\openshmem defines multiple collective operations that synchronize across multiple PEs in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). +There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other PE in the team. +The API-level memory model fuses these matching calls into a single event in the candidate execution. +This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of PEs, allowing us to establish ordering relations across these PEs (defined in Section~\ref{subsubsec:api_relations}). +This does not mean the observable accesses must occur atomically, and it does not constrain how the operation is implemented (i.e., any thread or PE in the team may perform each observable access). + +\subsubsection{API-Level Relations}\label{subsubsec:api_relations} + +Section~\ref{subsubsec:api_observable} details how operations trigger memory accesses, but it does not define how the ordering of these accesses are constrained. For this, we define a set of API-level relations between normal memory accesses, API operations, and observable accesses issued by these operations. These new relations will be used to extend the relations and axioms that constrain program behavior in Section~\ref{subsubsec:api_integrate}. + + + +\subsubsubsection{Implicit Local Visibility (ilv)} +Implicit local visibility relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating PE. This ensures, for example, that a put operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: +\begin{itemize} + \item $A$ is \textit{hb} ordered before the operation event that issues $B$. + \item $A$ and $B$ target the same PE (the PE of the caller thread). +\end{itemize} + +\subsubsubsection{Local Completion Order (lco)} +Local completion order relates local observable accesses from a blocking \openshmem operation and subsequent \textit{hb} ordered accesses and operations from the initiating PE. This ensures, for example, that a blocking put completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: +\begin{itemize} + \item $A$ is an observable access issued from a blocking operation event (i.e., an operation that does not have nbi in its name). + \item $A$ targets the PE of the operation event that issues it. + \item The operation event that issues $A$ is \textit{hb} ordered before $B$, or is \textit{hb} ordered before an operation event that issues $B$. +\end{itemize} + +\subsubsubsection{Remote Delivery Order (rdo)} + +Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: +\begin{itemize} + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. +\end{itemize} + +\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. + +\subsubsubsection{Remote Completion Order (rco)} + +Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: +\begin{itemize} + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. + \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. +\end{itemize} + +\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. + +\subsubsubsection{API Synchronizes-With (asw)} + +API synchronizes-with is the API-level analog of the PL-level \textit{sw} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \textit{rf} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: +\begin{itemize} + \item $A$ is an observable read or read-modify-write access issued by an \openshmem operation from any of the following categories: + atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. + \item $B$ is an observable write or read-modify-write access issued by an \openshmem operation from any of the following categories: + atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. +\end{itemize} + +\subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} + +\begin{figure}[h] +\includegraphics[width=0.75\textwidth]{figures/api_relations.png} +\centering +\caption{Example \openshmem code execution depicting observable accesses, API-level relations, and fused collective.} +\label{fig:api_execution} +\end{figure} + +Figure~\ref{fig:api_execution} illustrates how the above concepts are applied in an example \openshmem code execution. \openshmem operations are denoted by rounded boxes, and their observable accesses are shown in dotted square boxes. +Matched barrier calls from each PE are fused into a single operation event. +Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations (in addition to \textit{sb} orderings within the operation): +\begin{itemize} +\item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. +\item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. +\item \textit{rdo} orders the bulk store of a from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. +\item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. +\item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. +\end{itemize} + + +\subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} + +We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C++ model described in Section~\ref{subsubsec:api_pl_baseline}. +We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C++ \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: + +\begin{itemize} + \item $api\_hb := (hb \cup ilv \cup lco \cup rdo \cup rco \cup asw)^+$ +\end{itemize} + +In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: +\begin{itemize} + \item $A$ and $B$ access the same address on the same PE. + \item At least one access is a write. + \item At least one access is not atomic. + \item $A$ and $B$ are not related by \textit{api\_hb}. +\end{itemize} + +The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: +\begin{itemize} + \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $irreflexive(rf;(sb^amo\_op)^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. +\end{itemize} + +Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. + +\subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_scoped} + +Memory models for accelerators like GPUs often extend CPU-oriented memory models like C++ with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. +When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. + +Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. +Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. +This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. \ No newline at end of file diff --git a/content/api_observable.tex b/content/api_observable.tex new file mode 100644 index 000000000..77ec917d9 --- /dev/null +++ b/content/api_observable.tex @@ -0,0 +1,436 @@ +\TableIndex{Per-Operation Observable Accesses} + + + +\begin{longtable}{|p{0.260\textwidth}|p{0.645\textwidth}|} +\hline +\textbf{Operation Category} & \textbf{Operation Type, Params, and Observable Accesses} +\tabularnewline\hline +%% +\multirow{6}{*}{Remote Access Routines} + & \begin{verbatim}shmem_put[_nbi](dest, source, nelems, pe): + for i in nelems: + R1 = LD source[i], my_pe + ST R1 -> dest[i], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_p(dest, value, pe): + ST value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_iput(dest, source, dst, sst, nelems, pe): + for i in nelems: + R1 = LD source[i*sst], my_pe + ST R1 -> dest[i*dst], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_get[_nbi](dest, source, nelems, pe): + for i in nelems: + R1 = LD source[i], pe + ST R1 -> dest[i], my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_g(dest, value, pe): + return LD value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): + for i in nelems: + R1 = LD source[i*sst], pe + ST R1 -> dest[i*dst], my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{21}{*}{Atomic Memory Operations} + & \begin{verbatim}shmem_atomic_fetch(source, pe): + return LD source, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_set(dest, value, pe): + ST value -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_compare_swap(dest, cond, value, pe): + R1 = LD dest, pe + if R1 == cond: + ST value -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_swap(dest, cond, value, pe): + R1 = LD dest, pe + ST value -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_inc(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_inc(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_add(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_add(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_and(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_and(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_or(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_or(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_xor(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest, pe + return R1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_xor(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_nbi(fetch, source, pe): + R1 = LD source, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_compare_swap_nbi(fetch, dest, cond, +value, pe): + R1 = LD dest, pe + if R1 == cond: + ST value -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, cond, value, +pe): + R1 = LD dest, pe + ST value -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_inc_nbi(dest, pe): + R1 = 1 + LD dest, pe + R2 = R1 + 1 + ST R2 -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_add_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 + value + ST R2 -> dest, pe + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_and_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 & value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_or_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 | value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_atomic_fetch_xor_nbi(dest, value, pe): + R1 = 1 + LD dest, pe + R2 = R1 ^ value + ST R2 -> dest + ST R1 -> fetch, my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{2}{*}{Signaling Operations} + & \begin{verbatim}shmem_put_signal[_nbi](dest, source, nelems, +sig_addr, signal, sig_op, pe): + for i in nelems: + R1 = LD source[i], my_pe + ST R1 -> dest[i], pe + switch sig_op: + case SHMEM_SIGNAL_SET: + ST signal -> sig_addr, pe + case SHMEM_SIGNAL_ADD: + R2 = R1 + signal + ST R2 -> sig_addr, pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_signal_fetch(sig_addr): + return LD sig_addr, my_pe + \end{verbatim} + \tabularnewline\hline +\multirow{8}{*}{Collective Routines} + & \begin{verbatim}shmem_barrier_all(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_sync(team): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_sync_all(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_alltoall(team, dest, source, nelems): + chunk_size = nelems/size(team) + for pe_src in team: + for pe_dst in team + for i in chunk_size: + R1 = LD source[chunk_size*pe_dst+i], pe_src + ST R1 -> dest[chunk_size*pe_src+i], pe_dst + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_alltoalls(team, dest, source, dst, sst, +nelems): + chunk_size = nelems/size(team) + for pe_src in team: + for pe_dst in team + for i in chunk_size: + src_idx = sst*(chunk_size*pe_dst+i) + dst_idx = dst*(chunk_size*pe_src+i) + R1 = LD source[src_idx], pe_src + ST R1 -> dest[dst_idx], pe_dst + for i in nelems: + R1 = LD source[i], PE_root + for pe in team: + ST R1 -> dest[i], pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_[f]collect(team, dest, source, nelems): + for pe_src in team: + for pe_dst in team + for i in nelems: + R1 = LD source[i], pe_src + ST R1 -> dest[nelems*pe_dst+i], pe_dst + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem__reduce(team, dest, source, nreduce): + for i in nreduce: + R1 = op.init + for pe in team: + R2 = LD source[i], pe + R1 = op.func(R1, R2) + for pe in team: + ST R1 -> dest[i], p + \end{verbatim} + \tabularnewline\hline +\multirow{15}{*}{Point-to-Point \\ Synchronization Routines} + & \begin{verbatim}shmem_wait_until(ivar, cmp, cmp_value): + R1 = LD ivar, my_pe + while !cmp.test(R1, cmp_value): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_all(ivars, nelems, status, +cmp, cmp_value): + for i in nelems: + R1 = LD ivars[i], my_pe + while !cmp.test(R1, cmp_value): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_any(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivars[i], my_pe + if cmp.test(R1, cmp_value): + return i + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_some(ivars, nelems, indices, +status, cmp, cmp_value): + num_found = 0 + while num_found == 0: + for i in nelems: + R1 = LD ivars[i], my_pe + if cmp.test(R1, cmp_value): + ST i -> indices[num_found], my_pe + num_found += 1 + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_all_vector(ivars, nelems, status, +cmp, cmp_values): + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + while !cmp.test(R1, R2): + R1 = LD ivar, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_any_vector(ivars, nelems, status, +cmp, cmp_values): + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2): + return i + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_wait_until_some_vector(ivars, nelems, +indices, status, cmp, cmp_values): + num_found = 0 + while num_found == 0: + for i in nelems: + R1 = LD ivars[i], my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2): + ST i -> indices[num_found], my_pe + num_found += 1 + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test(ivar, cmp, cmp_value): + R1 = LD ivar, my_pe + return cmp.test(R1, cmp_value) ? 1 : 0 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_all(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivar, my_pe + if !cmp.test(R1, cmp_value) + return 0 + return 1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_any(ivars, nelems, status, cmp, +cmp_value): + for i in nelems: + R1 = LD ivar, my_pe + if cmp.test(R1, cmp_value) + return i + return SIZE_MAX + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_some(ivars, nelems, indices, status, cmp, +cmp_value): + num_found = 0 + for i in nelems: + R1 = LD ivar, my_pe + if cmp.test(R1, cmp_value) + return i + ST i -> indices[num_found], my_pe + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_all_vector(ivars, nelems, status, cmp, +cmp_values): + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if !cmp.test(R1, R2) + return 0 + return 1 + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_any_vector(ivars, nelems, status, cmp, +cmp_values): + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2) + return i + return SIZE_MAX + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_some_vector(ivars, nelems, indices, +status, cmp, cmp_values): + num_found = 0 + for i in nelems: + R1 = LD ivar, my_pe + R2 = LD cmp_values[i], my_pe + if cmp.test(R1, R2) + return i + ST i -> indices[num_found], my_pe + return num_found + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_signal _wait_until(sig_addr, cmp, cmp_value): + R1 = LD sig_addr, my_pe + while !cmp.test(R1, cmp_value): + R1 = LD sig_addr, my_pe + return R1 + \end{verbatim} + \tabularnewline\hline +\multirow{2}{*}{Memory Ordering Routines} + & \begin{verbatim}shmem_fence(): + + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_quiet(): + + \end{verbatim} + \tabularnewline\hline +\multirow{3}{*}{Distributed Locking Routines} + & \begin{verbatim}shmem_clear_lock(lock): + ST UNSET -> lock, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_set_lock(lock): + R1 = LD lock, my_pe + while R1 != UNSET: + R1 = LD lock, my_pe + ST SET -> lock, my_pe + \end{verbatim} + \tabularnewline\cline{2-2} + & \begin{verbatim}shmem_test_lock(lock): + R1 = LD lock, my_pe + if R1 == UNSET: + ST SET -> lock, my_pe + return 1 + return 0 + \end{verbatim} + \tabularnewline\hline + + +%% +%\EnvVarDecl{SHMEM\_INFO} +% & Any +% & Print helpful text about all these environment variables +% \tabularnewline\hline +\caption{Observable accesses defined for each \openshmem operation type.}\label{table:api_observable} +\end{longtable} \ No newline at end of file diff --git a/figures/api_relations.png b/figures/api_relations.png new file mode 100644 index 0000000000000000000000000000000000000000..732cb7da18878ea2fd23d54125af260fab13f04c GIT binary patch literal 53238 zcmce;cRW}B`v!g_yX-x)g(zf?jErnWB75&q*(+OSgk-0TWJ}84BrD0@iZUWuS-<<# z=R1Di$K&_!@A0U2Z@kVq&+{Djecjh}-J-NKl+NSR;v)!hUPW0!2SLun!e3KdZ1~Pk z&pRaW55_$mB{`(DpZ+_1an?pwT^2zq;t7t-u;6PvSLHkR5QMM|{fp7%QfP@F1JNo9 zvbVg9*QNvS-g?j|vpLF^mM<(gsMvJb((z3??vI4&Ag7f83gIdBItNoKky0UX>aL*{fHt%58^O z+S}W6^72BY+Dilk1uNhiPm+`I9#~h=SXo&`#>Xo;JC`nb{k_s;ZcjE(-``*AWPSD| zw!|YnIQR@XC1vFZDV>P3!u9J|hU(~aEsUJuwmlp4>J|5$_fZn!;+v}`r_V}iR$rzX zn-~9`&-!PxXJfBky*ekVp%bIX`fGc8Tz;#A%A{O3_~^(leN~jZ_;wj(|6v=`2~suA zd-Qt5-b^5^*^S=b-abdKI7~RHM`8T$itLj&;nA)~J$lsoW2E?TQW9&?>F;4pc9}Z) z2M^vXEk9WO!py>Q1|g=Rig22&QJIiAJ-XL3@OR;6K9+?jW{F~QrT+Qj(=!k+b5gss z9KqJ@d&Lr^x7N3(bn&@dva#JUrt!vHBf#3x{mSxc`o=?5l|EqB0r( zOdnqpCeqi}mzd3|nZbg=k|tt-GlIO_QQ*Ub=#U!q}SIdiSJDQ=c9Q`hBmf__WiuZQ+f1NUV~%XrAPYr2`q|f z|8rVZu6g|#6XUWsAogCWeqpRVeCE13!Hl@Std1r+j~BBEXveKw{2KmF_5K<75|8H5 ztmZLB?Y@er_%>rr6#5g5oc}uHAC`j^;}(xz5f8WuvQ#mpY5?m0qD&Gm+#54GP^&^qa{H0O;o8S&mvX7B{z0nGP*bG?|pV*E)F2C^>!QOj&;x}dXG8$X@z$1?)| zJDu6sv)tn7j>9_#mz(3rSZhyR_g`6{2zs9n6MvSg376Kv!QpXQTGZ30)IH8TSFT*? z?(4g5VZrK)k6gHTvBx@s@;3gKiJxh=r}lE@<;71|35PplJL#{g2gA#gdijaluC1(G z)6pSaSX|WB*N=pArs50XL)dvLa%II9HqrCEJo!%=k*s!}F-j{d+HzD^XRr40_sGMb z&(dNoxACZTwYTFwG&eVAq2$#M!eR2$vOQgh=HHy0J)z`C!b1`g6N5i}(#dV8s34)@ zHzp7GcDz#-*imWKiy%G?jG8jXLcX}j1wOvG5*Fhh-z}E4+)K$|lqrtvg@tw)h+`CP z%OhWBvP=xgQzr?JjD|>dPmSiE0}mG9ZMv%W%s-5!vv;98^kK72NGCjs*Pt2~Nk~cx zIZr2C9)TQ>cT*XAVL+@zf}TBlCL7WK$D_Tg3lo8;R^hW_ze~Je5l$);V7}2hRW^Vx zBa=@XeHvST)BPMeNnCz4zJIC)q%o1g5&H-XRwOhu)O4{kwspYpt#fiU*+Mt{j=mb5 zh{1lE$)gB#wG^c3goH&zFcBfz^OKgW5Z$7pqJomTdUM~sqj~r49U9Mne|NZL`+;Nr*0@bJg>zR| z7ptHk83hGJqD8%|lT!)t<@^^fnr(%G7e!s^aq@;DGcxFrpmE1e2y9QQ{^rB_kOAyi z5)u-wix)`(_;`60+}#Dp@v#yU6BE+YqaQtbL}Fqc!6aWaWV@L@d!(4C5(6PS{5&13 z-x;Lv)hp5o^uWl~N5O57glEt2^M<%$ZZGNT>YBl9C~IhRuD_j)`~3OV=PzGw#BGsB zcbth;3`sQ!pkZcab`(TRzkN(|q(E;Gd3)=74$1NO`wt&(JXPa(^5n?{cJ>bIO4U1e zXkZ^gF8s0e(Zb4#J&s8x?xkvKr8-=M%Piey!`XT+ALbB3Ap7q6BcZAi`ZB|@69gsGmf1$7&e_Swo~`FXUJ{ht_8C`2 zpBGyDku#)o(z#9@w^{mNiL7S4biB%j;BARs@e6f6NEEH^5(XudU%!5}b#duB*k0DY zb0-RR{)MYoDQ>(>lzMPYd_?~v5luw>;pee~wn=TNV?Nidq(M@$a2vUy7V40+t%NwfX7}g+8~GbSKj}@BUy0y54xM@X+wkP+X(;J3Bie{g0d- z9Tf}=Xa}2&7Z&4X`WaaW+!VKKCY_V2w;=yR^x`@?>Jyn&o|q0J;u>FLqmsen*?6bk ztzyAw+kOd!}xCIv#73~kj2ZZ|mY@k9mr#)xq;lL-I zwd!{iTPcmY3^_G2E)M%oU4e+p4DXaD1Xw_3C}=WxDt4ZeDZ8NB_Ge_8*1DDn<5bFeKV%30maK?u0X>24pnC1k z6z#!_T^Ib+OiWeK!#VqPSeCU6;%uGg&rTE}8E(NHKW0i$Z7&aWWr*Edi{q^B%7k&) zXi-s7sWKY=Cu(OOui4<@;k7LFBms)cFi`rGzBR6*`tt$2R|m#wGDcO>(4xyj?ru&c z6?9Kxbnpi&F3{}eal8#?PQ_`ECLnjeMb2_$l|tXuep;(X_y zW}agr`%I-3N2fkL4QtSi3mMbvs_(eU;!8$-pECIZdQ9Cxu4gf5V)44xd^XnG+q=8B z*P=Lzh|GqQj^8NkK4x?L34G(TEEOq-}6&&Xx?eU z*(Xmj>AfkVvBt;8ySuxux$tA{06Mu{+0agj70btc>C$ySKgkatKIDwpqmgZH&J>Cf zW-&1uEG(?Gi%5`^@9qWo3Nk+=y+K6vxZ;MY>UdjJ{9;woojtA!F&6z#(+%4v zlUoYPvoY(KUz}e{*yXM*;g;+3!c&`5E*Xw-z#4lDDo|5l;o!763g_kLW4pPz$@xRo zM0v^JECSJvYWLkY&Q>JIxU8!zV23By>2P=b@7aM&$imHiqm~g*vR|z;*S~H$tt%bUoY8JcP243Y|S^X-3!0+~}w%45a>WqfQR##)*YU z-WAVe(~m&u1u+H}3ITw+6@8%yp+{%M+grrzlTx*p3~NlM-#>uSC;kE0v^MpQ3<``y zAGt`EyqqC_z{ez~^QtN;Ej(rQ0=6EmQEgl_gI|@R9>R*@H=Be~^@@>3R zutq`KOoMN(P7zE&Wc+`w*M^$$%j-L=N3Vx$FMDp>{g^I7RAnF<8<7K< z%BAsu1EL`N4g{p8qp$qzG#xF$ijmTT8$N!a=Q>Tllb%s_^AyR>{9Tag20Qw3W~S3H zLy1mN`MqV0kJ&ONfYP`4p#&kqiftPl8hWTHeQ-A0f0qVU98JL2K2b>N4=SfAE~wwl zj9Kg(SBdTjhN5pglc8kzJ9?AQh=?k6#-Y!jL(3Zf;G>rk(u)RVA1!f?m78JuQZw_} zIXJYv(k(JoX~Q(Hn@ zuV25OXyFZsksi^=8!A^1^-4}DEuNa$)o^xN_2z5%PG8_4**0HgO_+W4-uCH4=Sz{y z#V<8k%dh%U-@zx>zO%{|47(w0>^wZtg6Vy0xqMEr33n|oEh)!sTSM@z@jmj?=1!ifu;{ugE`8}ztc2HQ&Jz8?dss5}$B!Qimt8?h zb8>>DBy-BkaA?F0%G$7AzI+)O6(tMD6p9c`k9+IWb*Uw1kowb;LnP?ky=6c_UKG9( zi~p0m@gx>S4Z8h$?TWibv&c1_=j-y&{B*eAakD@J1!YeKZ3rhOCfJ;)`0ifQ*QfsU z=@T&}C5Ih>5X@qo$9iiUkxY!|tbeK#2cLdZfK(i#)T8chWqJvZbM#$!tJr164Yca& z>T4H6YvW>Bj*}&_UO{PnqGKi)vA({Zp;!DBaXmOVkdw-`JKOwy$EQaOzJkdv(Ywcy zr1D8IQl>mL^Opb62bg1xQ7^@AjSWfPhsDAs)|#HtKYrNriHkpTu(dT!55EL6iP_xI z*?Hqf@vB#%YD?GdWQ~pK6H2Gpk>C5ewLKdGI8$yK=CiiCT8NOa@=3e*(X#r{A9s4r z`o`W={?yMnq$gEfR08*(*xi>^P&fu}q%KMU#oHTzAIY2M6NYk>kN2u7}7qW9XQDNCxIJ%W&-b)h4g zgo=vOyA&eWSd|TtZ|!TSL~?YBLc1>e|GB^P?VI_Z-`_Q)_)gwMmcb`&L|E$rPXo-( zu}(Vd3}>H^BQS&uMVyw(V_ct1-fb79q^913?VbU^@5Ft`{*CaslExFe*F{D1A9{Pk z9N7>)cJ?qq+E{pxvqI7m?+zpgQmuZ>-H)}Y-Q?GF-+!G(aQcf)RnRZ@o(0rJh2YFW4V;;p zi>5CM3RLlF#P_c27!x|51GbJeZ^Capzs`_ejnGCQZd3JLpgWPY!?#K`hBY%%zrmHl z@)o2wIj5KI-LL(F(RSm(?x=~cru?;g{EJTsm}SBhsS?b%f`9&OZ>AZqsTZR z*wn@VSsFQTnflz*D;C%g`OzFwGtTFml0Wnb)sN%&8>}B z$z2$**;f42lf>ziVsa)(=Ja6TS|A7Mv8sHo%*{TT0HrVy$a-AKH{f8c%{G}7;snVY zf7e8UXaud5f8f#JqpYtqGe18O0VOZ7z!mI}mJMhAQeqpTNmMMS88D?N)Gbp;;b_(l#J7#IYd z#7t-g{2~ht4}V@%gjd#d@_ukfg4!)V<$PoK>gS@H5NP9Er)z42>reiy>c6q47|dqJ z3bM)a#$Fk?)wI75oyfZe1?d*>BIXP2QNn3$C?kQC+mCS1XN;Nz8m)91H~PH;5;r`Q zVQHo8%t{gI!tIaNf+B$@erA`OYFF-y?`|?{~*WJBxiT=dx+^w;+Z(x`r7viph+Sq^2+zJH-a>-Qoa@2>Z75heFT+t zNC^iE9GRY;9$*pN#pzbMpIp4DrK=mZv}BnZH|uTF6~~Zh!EmsW7mpyP2id1R!siVw zE!o`N-Pw#bD=I3+DlBlMPxdW(Dt^{)$8Fx-cXs$KlMPpslar5~^Tj=XzJoK{FBlTT zdfk3;<_s&C$}vX7X&z(FWfJAvMNttmso|A}>i= zujSj3j55#8bRw;q)Q?$vl_Q%rd(~L6dxUjT9uxnx6ru%ybv~q}S(urJhXBATg>~xt zbh+GwqqPE)*!G9s+uvHuD{3#%jR9Ulc>vX!$n}{9Isz&_%VNAOpMXP~-`Buh(!#1JH`bG3B$@1Lp?o8VSpTB z;lu)!yp#m6M;?CpqL-^tFl_d7rh&1SPiJuDl*{&<+V8+g|08KzavQpf@XmWs0Cl`j zNrDVm?y+tV(Wbg2*Q9^MgcbWR>cpikem2f-DXL5OlfI%{{cSHV(cO*d=u8PuA4yYC zbOc13285o=Oo~E92Jb0HPwE+)Clkqj5+(qb3&d`a5krNW`fIb@)CX#anW*5-W(C>2 zL$B+D`|}Y@t)`dfx*xNOx-a3rfB$|A>dgRW*fLMW?r~W6XW(UOM8IZBlnPBSbMqY3 z*U}2Q>hR^ll`AC4$;pm_Tcm(=2O{I_)_(j5hOz_jeDLO`E2MfxCnu*l(KvhxZVWxZ zT;&1BJ^*YNlW!Y8G%hQUxq!#q9v!1UI5sj8hQ{a5h1zJM&?~)*J`sVM1ImXPR8MQUq(F8ktE*G=L-{5PwHDxSX(xqX#gLXDpyG7G9}^4YpwH|{ zWEXXvL*QPMSA~V;ibtN8mlLQcZ73kTG-OO7;eEp|OlNC>K|6qwHd ziWwzn)6&w`=0o>ili<!`%G3{k_EQJN_Jt7HMxl6SkPiuJS)`RI`Jpl z?HH`_GG8nOaw{qxJ8`rx$0$R#XP)Uay{wFnCkZ7EH#p1iq6h_ zMnox0Ne6(k=y-lTdd$8qh0h2dC}}v)gRj-(rY>-E<3nj5G45C+;jtAfGJ_}xzV3je z3lC&lzIE%?92{O?0)X0mo`<((L@tD#Fnhe=c{xoD6tLg7%cxKi5s15C3|5!pKRXB- zDWJ+sVCLfEgYtlHCh+uF`Nj>>%6Qn6TE!!{PK^X&O*SYD~1I`>X2M0Fn1lZ+p zORuA?uJN0=TPpA*;2Q~#{){$ZBd-nKJc2oT0kt|*kBVky%ySo_=apgOz@h^UNgo%; z7Anfm$497`M*@NhoI%L5a&Q1qT^Upmm!@M%ZEfu}J2b?-DlUEuf(-CtFw63AYU}Io z>d&PUH#Nrwk-Rs&GxGVfD-ybFEkyh7$rJKisGD7V8nOh_)YgjVM^B$2bEp!`Aq%`A zP7qYta`Sb6B84I`Y`#fX7PwS{68U6lLSbk=)Hz* zg@gxQ78XW2xVYG8(g>eOgHTwzyX&D*HhRzg%`?nq;4IPdrMRfbOfv~+o@EdxLif7H z?{#F;#45JI5S4#mi>Ic7mRzrRa*lDopE&TFq+4M7_KH8drI+GTV1mp^7sY8MKR^H5 z1FUwcHwJylPo9()5Ujgn#rj>)yLF4Lh>GXp#fo=?hM(=P+6d8(-;7W%8=ah@GORWz znZ#5&nP6vUmn*j|gQHSj;C}a%3CA&^m$JclAgj~uI6Do*-k#-_5!il0_+7m;gYiOM za}pu`vym>7@9uG;s7(dK^yXX7pT@hl_L9N65E%5eV(c9S|IVhI6AJt=>s{x zblB(Kn&Z!PifU6*DF4r}TsH}3;Z>UE#@EOKXRgT(X@w|CbHPX*lY)NU7_}85tnyyi z@bFwPNrG)r6Sep!L{K`tOL}jfw8>`I_z;$l+4ZxnP@=Fj7d_V@H(}(s4EY+(T0?b2 zpvlzngcq4-;I}oBKuTC9g#ds{!as{^Ev>973c$p1jm*pv0`Yq3Luv{NtOjGE-vj*? zsrM6l3$g?75wDfph%t_b0OJ(%T6+1L)?6ox(QKjHhN?N>2Ees{W1ycEOhHN-4CO=B zi1W)K+j5ZgYG^@t(tqm=g&9QS=g-fqWb3zU8V{}Rc;IwlHUkAf@W$Z$?*Zdf%+tm( z!~0JYuGIh-skwoo67vrHnrHjDF)`;}muNlXd#E`9!r{tCK<1;{-Z@ryjQF=A&e3_1m|3fjZ+kVP0jxEvYb{$Y=Zo zwttroo3@zC*%$bHc;L%-@9vDX#?5x*G&C?Q4`eHW)F=!|9JI?j!BdbW&E!=6`4_;? zf@+YUQ@%ECm|!(@-LjxyC@L$%QApBY^j;##T_4PqZ*^LzyDy&kg7}oVA4TzK&eg^a zJm@tUM9wr@}KoglA6JujsmLuRiVt4YptfQyL zQMBwd+vwkIfq(GT9&q6}M-bkvjwJ5(_IQR6%bP}K75br*yJvKN%Xl{SW(@4VGX#SG z%IDhy8o)I&q>WX4I?n*R`i^wZ(98fRk^pQvfp*i@(rOm2>wrTUHi;Qq!2YYJudky$ ziY%;06Uvb`;LdWixXOX&$~jFHlu-Gg?e@EC*wol>>w3CFRByfac4!xCN_sj0cvK+X z!V1s=my3CVu^J7B1rpiC-Z6gSnR^QXO!%S zmrs5d$Xu6|eYm*~1}iNCgGa#OwmJ&8SqiAUubz&|9=fzwt7$$9_#OpA8R8Fa6!P%% zciq&K!a@L92d%6;?uaJ}*qiob-#L0Z1DxWEmoHT&6hFWWUU6GcNbeG-2f*j3+x)9@ zFJm9^c;li;om-cj?8O=Xa98__t&oQ&5y?+U^D^5v24*}mz7m1uhfH9L%I!;YI*a4; zabm`G!dG+K@azQZ%-1e$nian< zS{|XfPG^<+f!q^CGIWZTJbunxD!R>`Os`<5tf#As_wy3Ju&|!{xl^(D*jS1<8vK(0 zz$l@AiEj|DS{EO*&n(j89XVhI5fBh0)i<^`)PI#qB{-crIuP^0x1zP$1a}ySJ&Kj4 z)lf$S1V|Gyh+BG+tOBb2JTj6CBi1J)e!l(V#}Dj`$X{JDC))0T3(iv@!kgXA%X3_; zahlZ9(J@adxdGH0Ak)UN>Z=LPK*ZD@lQ-PFw~79fd-PiKKAfdDK@i z(Qa{h+6qu6$`TuwQB_7jZ2^!9^k?BVJTCxI`@54a+P7|5ZchNx0c>#2>%iry z*528<0OW$#kD>xF!P&g^^%q4CaeGIHN;s8ThA0V8c#4LGbU+%9&0~Roq*Y(uhH-SY ztSJsE$w|rh{Co~D4u*%vvPhYQZs_UB_YixNX0u!FX?k#;NGrB#{Y(!WSX-A^C)oe^ zB+tL@xI=5kP?Ok()^lyF1^BlZDFxTvwL2O`w@__Djv?J{Nl+R%b*jwo;wSem^$Tsx zG#J)40kzVzWj^!t!36LzRG=eBmcpj&xkncVq~82{N0O%7S>=8Q?&)2oWkA$z%(aBH zdIK@bu<4D=d8~~wzvI8+;x{lI<$tMrFt)pRIo{>jTbx(`FmfXKcjNgvIhDTT+`k~- z;Bq3hJ(8onvG>`~Yugih_G>TIOPwBePR@H09&2;`SyH`>d@b0q8?7gamQ2~AbEZ8M z*COlE8IoI!al}HDKqQF6+hgt8!D(ycI9gtTamcw+!+1YO%P1}E?E;;Ym`fUWQuugP zYL9YlIp!S015KPr*k4D#Qk#JIJK1ZpNslP0si6ei?samoJkBv5yW}7L0#d`%K5I4( z4npsmOEir{()$RE=o)0a*49?jHwN0q#&PVbDHk|6I*Z?9C*NR)t&5U0RqE37@0PRL zAQA$F1WvP~?EzFgotnD4t5pClOS7$oj>L`c@KK`ob4{GQCXMuT!j9K$ZMhKB-L>)k zr6f%dZo)y+LUydd zSA>O?)$PaUi$@DoO{PS9Phqb@{TPBnBv*8+)&7k+>~O-a_MRe`sN)7h!d@^7jSaKG zcte0e#+TJ`bmWDzk1aJoA{{jI#v#>mw{FCK=aE~s%t4Rl7&ow~3sIa|$e*syK5dI^ z4=sSTfVEoyvBdG}<2z6c&IBAfA+CohK`1Ro#n zu|n0PDYZkc0lfIxXo%rYB|Q0jYlkWM?&21YJOYZ5UeZeo*BG>raMK^@Lke@~%$z)r?q2SUJDftaVh0n`CCq)h9y=cY%sQUo7|REvE193PIP+}GR<@ph0O5 z=xLZo&XeowYy6tK^fhDY_gT`I4?nEZ2OO0pE;6X=xrR54zg@f9;gI#KDr7ygi&9B8 z|E^?(MxNjRorkgf4D8@A3v2>Dqm~&N!zVXhE}alEN^+%LSSkdr7I#^6d+MuKUz7{o z;|)8Yy?|cMD=3I?VhJUd)zl;kIGmgX!L>r&m{~x86haYN#{xzDqWJQqOHP|AoPySv z=Oe}p?l_H=V}Ro7_-0hh`xhIWy>g2#!VF1ook~IuWGRz+qUM!3P{o8OgIuNr+YpUD zYBe<7*{4S~Ij0%(_md0 zI>zzV;%)Ae!{VW_@er)p>Rk5DLrhVZ7vN~5l8GwWE$oPtZPRfd zE18)jgOIbB)Ahxd#i++?oMYjv3oflLH$}dBdN@ku*y)3m*x#{gjY`+2QkPdQ-NYobKWUW6!$fhJa;dHZZ^7GOflQ}%1PGs$G*w2GY-9EejvC~sdn@2H}zY|1Z^10zg#7~|R34ppCip;C(V7`#!!qpmM&p95w8 z&F}W3JPMpOa25$K*Xdx>^0ofQUc@qbdc`;F9I)z)faT5tjD~Ii@RBZAU%GV3*4DNe zY$g>@@q&wk30{27jMe6s461)r%~o6rC8hzi{qCrX2E0=6s-9p3z50y!{tbq zo2{K4NpR2a-@j!!ySmNKdOTzfMSzTMQ)2RO*CV4sbxu6$`T2Rd%K&~gZr#33(6WR& zb*5l~*Dh{Q9bYz;?EXS3XPFPA_urDcHi)Cu3pFG|=171_Z>0D(Sg;civ+oUYDh~<< zFE270WV=1JWjV*3rIgmm{Hmnny1o6Y%gw_7AEu{QlW%iaLcKUGY(!eS&T4}ykfJRpD}3KMeHL$bg@X}L5tl+ISnll|vR)iyBrz^H+{C5XLB zvGU~C&pc?RaL>tc!-WzR1A#nVSX?asAKxm#N~9Uww_u5B$-qL%wzAPh5Qr)IFLHCA zK~M@Dn0CooUr0lZ({M$r-v1iKtb&7HH_i8mLJDzQZ5Cz07rYTEL^w zNFcAQ9I5s8{&yW{I6#`g342|MOztS^<(_y<>idhs@QQaprckEm%FoZAu*6yY@k8tO z?Fc}~DpH8#>Cv_$1v2-&AR7#g4S38aJ*Mk&``7`vpNRTl!#Fvb+~B&H~|p zmgq*FSEE}a9q5B1xUwBX%3{5@Rvfy2NnVw3HCAAlQ>ZFo`fI z3IFF_zKc(f^Ff1D&d+4ltCNv-U?l`v127&vqf}rN>k}p+b%jUf19p3npO5xfhz6cW z{^iEA1O8m}qXw>9%zE>k6^q_6-F3zj)3 z$xR``IwqUK7Ds0ZNNXGf43w#mqxLZ@Y>2=R{PFKPZT03}%YsNEZ2|KB3|=|79d zfO_)xMVqF4C(J9ZmFQKCkb<&!{``3eNH*!%y8o0Y#ZUkd1$R*(8L=Ye`39Fz%VYcI^h||gP;TVU&x&95Q<1gO{y6rcb ztKIS{z8SOYy~$8A9Jm@9#~5biiTV)yn|M6@XQN7N_~ur7rE2fT%C|qgz}sjc`B^qE zbUVJkKB7S~>$Ha9)Zx3_cCs;Y4H^TUO3hP}tKoM(sd!?J^u875In`31>33Yqw`lE z7=`6;kfR*~-*_4#sma*x?rAclUMZdzg176*QjFBx{wKW>a`u8mQ`)`nE3r{$^(0}4 zWn@ zYncf9J&5Ocl?G!VOoRB0=*g@|Qx*R7&_WOi5+JjK^Hi$EqioY?)^lN1RSLO)M?Dlc z4ZR6uxp_+SFcy~p8;VSoNSE!&91`S9sZp<0kf^j^vqKdy9EXeJ`k4efG zvL?s4k4yqne>WAb5iu*1VujW}+BSkN0W#;uXJgD|&NosJ8)Xv*&W+0EH8`@O%{^J> zqkzUrH*QqD!%dD#z06c=shL2M;8SXL-#R37H+Xbn~A8yr9ylIxTc>qBks#u|rQq z=)sN^MMvlK3)5nHT9FS27Ld3Dm{6$~r41uFX(W7e@ZU#u3i`a;H2gEe6!4Bg&+2sl z`jtk*XFv06mdjEP-@oKMKNzMSS%w#xqy16`6whZsXcEWg?$G$Ru^wAc%DkDtjzHE* zhxff)XAgZ=%tfTH5c$o>Os4t6RkRg3TYzlZRrwo2YimWPd{#?<)vQbblgZd;?OZu4 z%ifZ=Gmqd$^}j!ppcRcEiNJWJo7n;(MTx7(jj+))F&37BGw8F*5Z~anY zI1Zj*3wgX)umBKmUwG-I{hpTW;&t{rL$Aa0E!bmCi?DgXSb##!3<2~bV(y{v z5x@!t+3>_vc%sLH2SdDea*;PV)Sk{&2^NB_{pH_lmnu-1_9zol__~NX?v^2Xm&kO; zRA}qTs|$bZ0h}}av1Sb=7P;cFI|p4G#+{IGUBFv`{8Q|c*ACLt(|Z{yUES~91J}Bi zp&`#_3Ucymxmr2TpUb)PGWUy`HUzJ1RllZYZkGY6cNsMS{FCKX8gz=D9MRkUzoc1> z!ob~e8_>rSQc}X>;u??bRpv-eAAY@`mJ>SvdrwqSg9R=6;Vi=WiLN$p%4o%g=aCz$C$+jBk|1inPxe^3`0spPM zoE)3e!zItALk37VzkQ@wSXjoa;YV_LLpiTst9^Q)-W|koO6Gdx;&=(R0`yX=Tm#v+ zxh(M50ci$P@|bnYpPg0qQ&NFKrc<6l*Mt)GP~y`T??{}Vk6h_fpFZuzC^rPzV!Z7g zKtfd4MD@SL8#iu@$&1A;);@biJ1-iyD4NvMjyh-$z}0yd41>O}LVGM1pnw&PRbO^sd%dL@*Duh#&nqTK%# zw9+wp9zKvNQ3t8rT@8)6zKZ^gF1HL~OG*lg3Uy(~%s~Gdm)WF8;+URo6a+G!z2lDW zBp6KtouFkS28zx0ZOPpdNzx{yJNDzVyipVS%iorl!555bhdnT2W@<~pdkZ<*U={`e zFEN>ypFh6jSNuR6|4uQW_+PxD)x2Jax&Two9YC+cdEjE1ql63fa z+HoY-BDfkxp+d(rWhX?N&dbqtH+um8nHooKzxHlfLxT~n=(xrHXzTa|t)u;-=!tF9 zWkDWJXmzfi^>fqPPvt}%;=ogpfH}j9Q9TsT6m`-g*NR7OSnEZ1baXI-L4=rL;D);T z8wmn2@EX8;mMDT$2ODirW~hBeZ3FViKX4D+1?fum&d0-%vH*7RKjHt)MFOU~@x|Z> zC~*vgNa}ZjmPi7?`1xHb6F+7<))O0OI&e(BxjF@@(G)5qXjOA0b4yyNiPk!u#rrEk zf2~fI8Y$hI;A7bt6KfaBklj4ZyoUZ&YomF=C=mfasf-|F0q;e^lWX#_7DzG3TL}0f z6PyxJLkcpq#p~wUv4ANgZHh@2)R!ixRNDvZEpKGR0HPU!pl%%m3{peRIX%8hkCj3v zXZ^NgT@UxS77piwt8)glON<-!Cv#{=w_XcXDtIz*yPzAB;H@|CwHFMpUMSvC z6gJJ7iw6#k{is)3(Jd-Io~r*~qwZ(uq@RZ%aCn7u=T7&*&Niul+6ODrt?qwOVleVz z2@O_!SCUz1Ud*O9D#Bxf08JV@(NO&YPRCDj9<&xJ=`b`oYNoG ztw~D=vN{c*sRBl;w|&U=oECn?>hNX zcxtiH`Z`2F8u>liQchs=(tiR z453*tj;oA~tBi?F^+|AF^S#^nWz_#XlFk}D^rC@^f3~=N4e#yTZo1Mu{59JRHQfBw za6`!MXUDfl)A{}0(Nfm0>Gy2w6(;3L%@3Pm&YpGK+(X^Q6H6Q$jFg(U%peJw{#xn* zeeBGiE`c-6;Dk+M#dFK-Fqc(S#5P}?2Tsr>!bc7shjsz7@=Vcek;# zvQQ4HI?h%+)5#yOMUWF1tPH~YBrn>^;p$r6X61c%XMZuC0Ap(6jl;S%9rSC5jZc6% z00)sgT&t4v1^i zCw9LtyVVS7rwqZuw+2q>i`#say3sx@tmqyixZq3|lKgy?>SU4S-1CaZ(4&n{i>VI` zNrKZ|_f5(&L4PQ0S@puPh9r%S16GmNbz*oR$D#SEP^uYFm1(?QTAQq_1I8TnrTJR3#oVt;o zmP5>3n|!OO^A-p)U?f$4WVrn;I7%O=FGJ%9@R_a8p8t=a9*SoSt;}!l_9@y* z{+*|yZY=p4nU4yxeDHy~?f~X^yscgjdT~9fVip*TC?n#!EQd(N<*$0L_36`#8}A=9 z)M%*Gm2K|qWq4r-KY9`kV1_tie=S!r&n%s7VsbL9i7m6|2I*eXs)PkRy4AOM_`4`1 z$Le>9*R}kIetZfTM%$c5si>qUZ|dq=4Sy>HnX}S!u^QA8H)u-+@plp0k!n#beisGf;UY*eUM5PqZ7^YLy4!-%Y2MVBwV!Zt1!GaSn!w20J#3kAv;@27CF5b*< zHm{fiPs9FP2+?Iw-*<`QZ*nJVaVL+xaiBsc7x?_Vl9E`dkGX%Mr^|QwE6A>Vvp$kZ zKD4eml^M}ji22@YVBlMpcVaH$8#%g-O7@f}n8bs&TmHZURiyc6;HhuTolWS&z(i1G zt|n^#A&}6(noizVy?y((>LXTchg8+N=IZ~_n3Z>Z0yo|*Z?+QfBM77aZ|z)ZtxI=n ziDAv*z>)-a!MHDoh>RA^ZRKz!_U!fRYljdpqEBKOp{ z{Ld@hyqV;BiXV0cE%?CynXj46M-yIdGr+ieD<^Qg5?)lb6nc~*(8e*TCkgWwH{;#6 z=8OPjqb5Eu3?e9)6(%6N52y_&c-fhw%q!4ogf`yQOfWZ2>VnV=3r=0Wx&uTj>LLK4 zG3-5v$tof&vtbROArGaswas)Xqn}DmN#VGU$(?-TKizh`zJ+t&T^gz0C(MdL499Y$ zVg;p%-E(tJc2{U4b69Wwk7nSu)TAVvU+R@Z>a}F(3+Q=wwERIkDkg@0bLUsh-V{}N znOn}5e!d$RziYj=IuR>5MW|dN%a`ICoEvYO_)`ERt-MdP^%}Hy^ELftLGLWf?+qL7 z`CA~9%0n0WYds#cQY^YzAP0mC>?<$SsBx!ya8OI1m(p)%R_f?OG{^qq#K2-+?$8QofXM4T#o}v9Vzz z_SenLpZEDkmR)Qto1d$T>6pF#PZ&GXH@AM|W~zb~$WC+bLkF%>4e6v&Yh4$F4H2Ck+l=N3}AxB-IGt4bEX zfA6E8B)L@mwjpWwYL;1XW+murd}3mEo(45T-(xvOReV-AZjT#Nzs3(YY~M@1W#dGv z8hB~V978%1yeG7~S*xfJxc9fSqlRPXlZ4D`33fX=zLP`cqFWP^{80CR6fy68kG>0} z&dK_?xJRt+^~>!|-n_%2AW8kLu3>17AoS69sia)G)8KPeG_DQ${`tgGyuLth`}F&# z^%4MkiKS-Vy!7xIklLe1dqhwq4W5GzIL;)XN`YdICn0n|Ce13YPooSb3QB%s-ZiPR zTimEK7VP9|X#&^X9hH8@Hss1XZ%m5#T-fb=9R43=TERlwd%@w+&)!Tx71rDJ$Q*Q_ ziXn60)&Q|+C)Dd%SFj@JnBbwv`9C_QkJIP?#wtbXs7=N{L0VnpH>lBDopde;`PngB z2M2ldY^kLM8`XK>wV2<$Nrqa`VEgQn-@m(_E_NPa_W0sznJMX=2wk8;LPF)bf6wQ; zVIbt4-gjWkp-zB7CL#Ii+kZfX!!_2ydAWytuy1i)pw>qHbTk+w~i33==y|o zm!lhkd;AT83E*r}>3kym&#-{H7%;Rw;zyYqzhMcIKfgXc{q_I$LbWbig0%tp+X{u0 zL*GI@+6+ZA3mK{_n!hm;i~^r3#oHUkPVB{u8vWAJCA|gqAe^~QPftT*G+WREPtSY& z^_yoz02OWgvO0_wkjTr;eK=X?$#1w$`QH^MPDk7oi|B^&_iHJhYg@aW4ewgh?$j;N zOl6gWirZrWb~QTJU4B)`{H>oJokE1apH+r240Knxz6T-)_O2ndKsNlJ*6fq=Ah2_i^Xpmc+DgQNli(jg%!sOT1?Bm@CP zQW{jcQ94HHn*aUa{{HKnb~1POI&oLNrW;JJ#*y z&^*V0I{or=IXB)Qn#F$m3-$bZY~v-&w8nVG%NqywiCa@N5ElgPB&A+gf$ z^9pmChX?hy&YnFRV^&7Oz)L79kIy2#x2gaa1jQ->z^2G)YtwlH@S`)5S%Oy3vZKB& zMeup=+~X0>^rWN^ux*pbgoK2^H}Kg`zZP7G#xwin3NwxAN};=ZLq#rI`yXn2pO1A1 z8Euc07lj;e{UJ!Vn*9bJvx+~3o7knxv><0F2Gt8xX(++ig};2~O7qp7yQ;e#IGU7b2`M-9fxkkJ8oC^hIhpx#vzEBWS+K$uSPoGdv7H z2sq}Z&>{d0;cI>;$jRyKR~9OQ;4o~wA~V8v`LSm-EaRbi`N z`TPX;2Jeo&6?cV5_-FB*d7Ss@_comYtHD5?pIi_yl;1!`c8lm}dtT5=TF{74tRgK)x!NT%Q((=|0{kms~f8yM=^Qh!2zJ^dWD zpAKeg#hWHw85{ds@=7XV*wSP)%lojSq&) zgyqv~!Y>ATI{4i;WNBRE^QIU=DszRMZ1;Wd?>X9z*=`z~@SIOw!8Q=i1pemZb)9#Frx>6%;V|@RJS%nQ9p@mlOJxH!S#yl(*uV5eg6I4MsY z*v_T&?*VT{ohWalINI`s@adnF!uCR6S+?-(ncb$e_1*t?slCCSDDbyzcuuO_<+si_ zSYYtwdH9IuzQl!B-~2-sf1NTt?ZE6t{q@!8!`Dti+rborl#QqZ4q9@eopMgIhWdI8 zX4H36AMk~E`>M;RZJ5(N4&|jb246WV=mJ357DS(oBGmHqd&$VCR_(o6d9n4i=2;FI z7p4%94QMiZCFPCuW>iGWy5}vNPxKv%!oSnkC@lRCQ(*S4L({hddwcmsUu6QSuR@jJ z-cheefA;KtJagV*ohzM!|B2N9Z=3ImROj;EmrsKUyb#W4d{%c=;{6|Tr?2GD5$5Jm`$ibD&R@Aii&27;pFf7Zv1J$~Z_yfJlnY`mlI>6U%W zBhr1cdQpt%V=WC)_HrYBc~sd4a;N_k&QH3hwmyU@`+tpjXOPF9^b_2WaWS? z;gErV7?Dv?NRN#Se@~tqgD3E#GK*IbY%3&JKjC9AC0<(#VU$!?r;pR)fydnRe&4w1 z_+#@%+pGx+T~)Kyx4YlO1SO6)$b4izCqqxZw57{mR!&r5MHt0QD1DJgPTh8vA%2uc%jDf8MA8g$uuawdKTnUfA3u}fz8)GIB2s$ zu?PrH0CGLGJA#3EJTsqa@qf}iC{ivDMrmI9V?01N^sCR3gPl*x^~k*{Z|e)n;E|Eb z31@jQ%Z9vhAX5R+dD43~SJ!a>UJ+nW8!%4f5YqO;a~jHR(7HGdUdXq1^5fo9%1@gY z=^(GOiauCrp-CK$y0iZzcxX9j3==tx0=r4a<9DOF0(Hj>fsA|ED#m4~BCmA+rU92aM1i&n2ZvK-0bMQ*WkA?NKJUl`7@*9x^0^Bg5 zbH~F#mxIZIkAy@B8aqhF@OC6>De+g*@35#{O7P;#tEwg2R=tOH$W-ZzwNP7ya@p?b zcSUOkP0-5Z$NljIxdwqR>CGKxu=U~57rwrA{@u%$!BXF?L7JooN`hPFvD?ZzfXJ-jOb0S3?J$8|c1N zR4=9Lt;MeN-yFX0;LVM&Fn8i96QP+f!LLGb8D=VD;3 zkrp0OREqiB6AWmwZYpZPIZ>_dm1&qQ4VKzIm3?`GaQc>){Kxcet9o6BYR)~|A|1!^ zPk1etmSfV>FOtx2k0*@v?F6hpSu`b+5%*_+#josH1b_NJ4kmF$C7`L~7zg6r>l<%k zo<423gO7m=ml7~|LNO`nm#^skIZeWDpqLO;EAij*b&BNs^&5M$PYJmap?i1K+%RM_Yz7se0Kx zb`gQ7x*3)?wUm<Ty;MGo31c4^OS@;4^%0le*LmY zlkiY~O-@d(te_A$R1#qV6R({Zxsl)=0>i*bwEcyaFFzQ5L0%u4Sq0nugZ(iaGipL|tb1=RKjmx{qu0pm zRn{fiiDDqjEt*^&ieb4{n@RjIw<;Ij4Z&Rq2ni=nI$Sz+c-8(&bg`ihS(*(Q=2Hs!)nHD6! z7a_SIl%Hh5-@i87tyAgxrv0~sv!MINryg(pA1&u#SNZypecZZqs5dl9vhRKiTKl8Z zCsO4xtS@jk7Dh6kc6^@GD>eKt;@!f#$$G@LCC4v;}lqnWZlM{(Ivm$_ZL3rPyN0 zmz1v)Bd@g#fY~B|>4m}k+1PSUUl%_?M7MefFcrhi^wM~@&OEeM|l!Y#nXTd z3g`=q-wwVz7o`7t4!XNp!u4BMLhA~($yM1u+gU}x%iJ9Emw{SSvj`<09W`|zsF*+B zFD<@Kg_`E7s)69EL_8!B@29w}XdEorCU5uefd|UK>nBawpc424U{!_A!N4oyp<;7V zMBaR(TVz6rOm7`58j+SSNcd~`;3NT06*vtzjJAEcJb$$@fL8)MlmVcJEBpEmxnRJr z4vVoe>;Q%qh2af>r`CLDw-h#%a3T|1_veQ$DL541c&l!nl>U8i@ClqaRp#NbfoYn< zEz0vumUqEY983@Of~W&`J(LZvbtX{BuoUc`z&(wc>gPIh16(yAuY)}_wDF7E4G?q{ z{OWjg{Ptrt%%F2sCye+Ag$z6lx<0V56}Bq~q}f}0S}iHwyWiyhB1Am^N+jsL!vJ#l z{YQZz-*#Uryf6#CJ5V)3(*(@iAii5P_REE{go=vFA#@IUps`Z*&qG6rq0|r8zPIdA zYnrNGX5LDhI?xR$54ZcjNi~+S5gD@wTp3q(wc-OByg~&%ZvXA)A68C-;y^F}gTV>e zdZ!wi3Y!1}9^YOH*2mnwh09P^W$6HnGA9?8Go}I>2_PImYag`B>$X7-0G?*bsF@W7 z=&57G!1es{#S2HJ94*)H(%^Wib3=6I3DipOya+l3aGP0@GnYZp7T)5jXUAlpfQ-!8 zuTr`a4_@3bSd9C>*exsQ)q_1d=z;fUXU$_+??hHQjiPNhjt7fn&)N^RmGI@m#L zuD5s9NM~AgUwVUo&6dmd4BLU)&==OXciw);&OW2UjPo}?4L@JEjbAMqSp66pGifl7 zEuX?5v-CzGp0M8Huk2`_raM*Nf35K7CB6Y{jC7mNI%Dom_EQj+uaNEKFDeD@AA!b7 z9G}1U6=<~kwG=M-7zZ}O2c(+zo2<}rvJN8&!y`(hWI$K`yH#87+x^n(M&g=|-AM7U zk2ek*Meo6w%P#)+O9$;&ZG;%@9{H3NMt>j3ip8m+{>{ACWS?VJXS7mI(Qbdp1u(Yv z!e?UavNpf#Z!IJxB#7;`J;%9NYj}I_2CU#M?io@eQ@ejXcIGlQG%C0y?Y(g;p7i6w zLWi4vOk4mppPXgFH~sM9BS)rH8Y53ZyCOKzNoJaeg zUiX`2u>?#_V_)q>y7({$NeMZIr;!0nB%Dm|=rP6yxyYQhFtOG~_oOiY4}+u+w(Lb} z3I`OUNSf=J;HpmdX%*JX@NPU)XsA_+@|*bckYIN(JTo;%En~~8xPaWyVa2au<*i^J zazBn4Vi#Htv`!N#ue_DO_Ia!ghRO7TS+(a^+X9>45;OA`##wAX`$6DWN@Y7hrHG95 z03UJDIB>Fx{^#*ofB>4W9B{F7@3qG3m2L|Fkz|Y!dkBL?mUMvJGrwC$N+YPpmV6J~ z0gqNtN;>QW%c}GM(W)+F$$tRch9ntx3{}vK*W6%lD$V!Sb4%_WM_=Lvg$G^_*@a=>OXaL$@-B z(+XrLQ11v5IS_`a?1W`bQ$qtuyzcaab{3yixrws3;lE3&J_5pq6c?gzytqjFV$^pz)VDoXy!+fkx9f+4xetCEy*_dd`0YVdz*@f}l4(g!g!u?%85WTY$|z>s zl?(6(xlxk26`F%*wxB@wT}*TI1L0vA!&I$RaP}+0%;YNAn-H+G2q|5;@*Mn`V9NG5 zFXl*C+g|``U_Cu1XsCyWQy)p~l-`Z7dwegy>=eb5zyuISUK@N2ZH{BJZd9vP)dMmFT?euCCHsUUAaeZt ztM?AN3Dap$+h=0G4?k3(tD?NbMT}mxr6r1_%z#6UWs>oh9}{^>mh?bTd)8&`cVuPU ziUg;fLwZdBC^{7*Na~nMKjOGU9Zq4hx?`bm3dj<)M8SF_&;Rq7-;#)Bae|MLDWA^d z&@`7pIruxT4Nk9py-+|b=K>0AL{SAC3exwBifYHbnG6$r2<53dcfXBexKU;qKRwsr zhuD=s!IcrChd+^TduX}2?Ljy~xLteHS?ISjmy3z-)-AD!Y~SVJwfT`p8CAOJZ&Fe3 z??_h09>Wfp2QY|y2FB?CS8g+N7Z7^A%_bZLoYB^Pf#qedM19`<%h7N#l0aPVP_A=w z{UwEn{;twgQB6DL$$PzC*1vw2Tkg_QjZsk^xv54U)$x(Y(o#clL&~?7{aT0Hv6L@O zfxGXprRiB(dQC^4X5QN^4)^I&31v5+7W(nOqX`6G1a#Adm3U-8MJpIKC$L(xc`Ew% z{c4On6$||`xFJJiEc$GH8xH~r6M_u-b;fN4RM-&r8de7WOA=C>8{ zmZ!SU+|V(Voc^|)!5?8&vqJamuWy)5q%V0)5Q#?D0nUHVVLIJbuBj#H;y+?41Q0(KKM3ZxoeHW9^~UW&pJGfN$7*M7h`(&mu9?Ck?%;wVN)Fr`~z zf98)IH(QivL-}0`3$DN36|RT^nD`N65H@K*e$>Aw+pj$W<{I0lO%UK<^sgZTRub^i z@ZEhqXeI!t-U0g`Z-9KTAOFwMi_Y{sf#;SFm|FV!M|kzir%LobHTNz!7rl&Z9*`Hca4rl!RF|BWM4)DHO=;kI z&`+i}?aXFfBXkbobvJeDy*%tS%W?2?_U?2#|4@1tmA~#-Ot_MkktYN-WXTXm&Sq{k zSh8-u-*gM<%s4%!Nj52?BxyCx{TA~H5gtPBl1ZmdfAuTf(*bp%HerxP1 zskzvJYG~`0Tt9Mv1#i6y*U953+u<({$yPL+9u^-k+mKEj-EQHT+}#JA5j>)GL>__W zlwSC5=}AwY;svTPVXr`2&-g9DWS>IX%NZHNJsW`=^_jxhTMJ$LO6~z>VDoISQ9=F;qjagOF(B+M zgwx7W#Ns^+$0*pPHEriBpMd2;zRz_lxh8@u(|7M5nToXTG4*EIzn5+la7hOaKXk<^ z@7jK)0X&gki4W>(_MZmetW;{#cVxfh#S9qgSthk=w;K2Y4rb=gh9mDi{=WX!mvkv9 zIFmyin5=@%QCT@;b>~mXeNgfOL+0VZ-M*+tzo8NUelAlGoxzdHD3y7#=^wJhE7RB& zHr2fWhWQ{8ho7^8XAaa#GAR%KmKzSV^iJ?p6Q-R~I_EgY7!v1_F7j{+De1tr0n5Wli)+do9=ggTQ;fUC31}z{pADw4 zr4optBxj`K(K$V=e*OAWv(wP`LUmg>vyr&*I5jSSRwVqzMGg)Ec((C({6BMXa;gaM z!;#?-KhX_r>dVLVq^^8?HxAG%O-;?y1XV!fKt#97p6zibF%UQhjXMAw7?MJsek9tn z2|CEy0rgBuR~gVK+0NqTH2j~S=KL@as1gzrO_qm0z%B-KbaA|P{3YH40R}#KIRUpK z{Qb8Fy5X(Sp;cM}39#@q0e8W#+n=)Q2%8*^iJB18?(8f5HNI7=I{-}YY#czjO6aU!`=|Xj&G<;D*jU-)Bz_;rG{H) z@$$Qb&j>jZPe*gCA;3R$s|pl>{m?*499=*#mG+f9a5aQ?b>}#GHcBClBy!L+S7|J* z>D0me)c7jY2u;SbJ{)?pH)#2C@iEcEKSX)5Cw-FmZ$6XRXxdku_woIptkdaq&tF!h zIjjGB&Cki%clA};Vjx9JYz0w`pM}SP(WbOTxy-5M(cVS%vnG4nQSNn)?-fFG9(1(y z0p5c7yybHA7*~vNSH@6ax{X7v71n!UQRETaBlm)aBxvxYc1CRZ^bcIOdVTbPVMBct z*@q`Ge2{@LVY4zTJd@^AJ7Gk}p}gh7v@c^sRu*Zbs!AHd_EDtcs)Kk_zMWL(ovPcQ zKn^MGjo5OTwsm?hNbQ|=!JL0`#k^wmVE66AjU0oub`V6^{CdS)R;!>{_9pO)Wjp%+ zWj>ZP^Ji)#TiWf#+Tkpj?b2_vEaQLob@bM6j*!l0oJ-G%G>De~5Zs#Rg>yr5RWF)n zpd_9;Pzg0?$UtS zbEO-w8-`gKs2pb)q8n?u;IwG(`!Q;ne1G5*9AcoPJyhKZvSuF;Af7#c-g$f2aDQG~ zq;)SeiFvYfpvt2_bEf$(*5l0faR6^XmkKBz&h>)L&Pco1enaJu(=j?>bK=*&x{6Bd zM=h*)-m_x_G%(e}Hl(GpH$n=^5}?95AsDrM z2QV#&QTH9su3@D7@?tiN%s+fCL(V_{Y3G@R&06p}eS=>fOE^q!Vy;D}I{k-$O%9V? zwD=BJ(W-vuM_3HOXch!Um#d9}zqc5E?D5RM^jL5Ihs#V8WH)r=qcm7*IF~jX74F`t zBK2oohKAFl&VZ_~eI??18B0+c>SG_hN;tfMHrc=;Ul z6V%uZ`#TATmai4%R#%T0TVz(R$WB|y%$p{8S2$&s6A#o|hty8JVm?rRO>Ifokm4w< z5Z3wZLsuYo(4Vc!P{2Po@orvS4wrK179CR^tiRh1e|K$=n4vk;l99xT1Hk#s1_u-5 z{zj=IlYpoBM4Uh72~(+@M+sJWr4}cR83&XE=MVt|z$~V=yS(S#(eyNtoWipyAM$wC z(%G4mm)_#U1m_DladM=Xgt}v#gQ`1ZuSBrsN_ubD8m9=Ev_fqR`Rc04gO0`P)YN78?pYkPX_uPwiH zj7T3rI>l~mMBvhK4mt=N6V+w0E9-2TXvY}eS3Sv5PW~{BG}8$TMN=3?=LnFe_;>KEilj| zeIpCu=6X$^r)@@dtpTx;;E8B!Q&W+WU~3%oXt|lgRo@5IVUV@IDoiDI z%7hM=@zdD;$S=1+FN{=?P`LU+*c`Um+SUerp|Yw|k)ZoDI|*P0^%gZx4}P603aO52 zQ?k26dD_F$8I=1r_lx~kVdg;IXbYFId)|`R!Sap5KBrs#rK<4d@C#;=?Sv(qpXGwa z_qm~~dr^-}l_grIA+@d!_PpGSS0QvH4|?`sF(71MV4#=r`>0D>ICBy5Y#?taz$ zZ=sD48dT6%skbCsNo^+teHCKAXhggI1j_7$-RMUKf86dE7L@M_z_4ilV)F6ZMVnW# znJH4k0Zy+M2_wv=#j2oeTvKm4W)C3=9Di+dEy$fV4z5Btl^@CQ`K?lB-7I-2168~g zBTMvm6xHwL-zg?JLKrb>p4Cu8g1y2*%GjNG1;7z?Ap1Hub(J`Fc5d!7{2BNr!R@J{ zCGB4(!`2zde(=5s`C3x)xY~{Jaul*_fC!3t`M{Lin@rif@SU|l1vpRfmzBZyh_&hx zb6?a!vc=h@r7Hkm2HzY(;Ah#Ae@I9)A+-x~Mm|9YfdV9TqoW4sb@UQHs_`?)O}d7xC& z4?>jdQZ(n6d7E{NZz}&(`Y$@6D!1b~$=r>C&@z}x(e$z~ryiq~@)Lq7-`Fr@>Bpm> zt!I=FB*#`yEg4IgAQ3;OZd>M&Rrl`u)yEd7>16|ahb?I&!traTo?`gfVMaA30$c}> znpkf$PyPXeQ_$8$&>anUJi;o%rHlB%H|Lnlr53qx~ z_%|n61rA1|l%JM2mt}CXIp?T>4#z>*pz^#8ZI{tfoK(EZUX!4QCY>dmp7kYvuypX6q6D`VENca9W1{ETf+Av|Wqr3eKypw#=D7Bl-?wy3txN1n zQb9)bAu3*cYhh7%S0OJ&Va}Jm(5r;o;X7kbLY%Td$ zAu4T}a{ZIVPv8YhQHHl7vkhD8y$PptivDcb#vO#I-Z?^y6hn_y=1lY0w*6| zboF8Y*Y#&+E@YOf(7gfMBqs+)zuuRDxR{tvK>vW*3V?71eTk0eF*^&q8(ha40O_ZTAg-+$v)c zwXm;Fb-3-ge*pyYdX*&7?CsnA3UsMy}KPJaR*Jq!Qkj}{Pf_;>)na|R4x8<^h7x;jUPg| zCy@h;f5#r77RZ}Md6+ytJs>P$>=^gw(U)gt%b<+`>P&qt?8n&68Cx8M4Cdv&7OgmK z4@TzWiz*Z3sCe@k)M$r+=l}B(l&8@D(3$k7_fxB$(eQT4&vn?O0)6GLqBC8dEuY^B z1U3JSfryvLI%Ti<)hX>o%qQoAd11-*_D!*|+P*b?01^CoT?M&zaZ=xuen?&OVQ~{( zW|TUHajYrF-P0~83ZXiK(krOMCQVHcg{1UAHI9ul*Dn;1EqLQ z_Cm^7u{$%lha@|R9@v4jTqcx696{@+va1P8Wy<=o6CUoWgCr9);LCa<@%Q5*liqzD zbX&}yV#w9VJP|CtomHj2zRi*_}l;y-vH!fb^pFe zs^s$iibU{GxI&YuGmc7p(S{MSV#IBX;+sYz8lF!DXO7uaIkDDQ!B2YbulHSY7?r%6 zRObzb1F(4`iNX8Q(xP^jK1c)rO6Yh4onbv}+llsM`hWZY3LWCjL%@?$UdJ1szkx0OjkU9W4m-32DO{ z7LrJz67mIwH{xNyIK*%M_2dRc`5{BUz(C5X@gFeOCuNv!WrX^y9P=64Ej%&SK5Y$R<==_>^;VkF!ACym(tDX*_O*`{% zX^>d?h#xftM*eUx-ITQ3dkoo?laxRG`nj(uffPrCc!DUM=IWz?hloc3 zZluC{`A8}G?K)|x`d_S%*-9X@3$zWPm4@MzX<`f|GpSO|<* zgX2a1RXeZ<0SJN8uDp|#k(~*RI~0T@4uX0Zr9f<~lq!KUA%`UDpuwF)VKQJyJAy%! z)}nSyK;p8IUnLE&W>$e@)t%P_P7TX|%uIbnDKE8MX8QypSq5+Z~p-R4h1c1Kn)_OM_%Vdlqd1ReUc!QaGbS2M{ivrW>4^H+H%Q*%@SJ%`-b zl4n2X#X)Kn0?UnSc99h%vh>hRF5ZS=3C?2>9gqXB3(`I&;ITj(eLM^dXZNaGNqmM; z;GugsC5F(1v-0NtA67>mF*T8Ssg1}2`Ju?BkC%g+I|o=p6$Q|6GyygsG6jK4XqZtk zP0UHYV#(HH%u#A*Hb)lc$n7l-K;huw;c~)pS!7S>TY%pgYI;CxLqD^q?nhyG)5q}3 zXMd4;z!4x&=!6Rm0|N=%+F(+;V+>8`nM9sKuY5jYIT%vC>Q*Z1JQ0hxMR*pZh+GU!Tk{9UE&6n(39oX^hCI| zC^%I}=;SO;&-Zr=H{$Gp#p^J54tfZ6GnxJ;)-1VzEn++&mH97n)9k#m^ABgh(si+> z#UJJ^OvXqaBpwF3Y}k}dZ^(RtDc}Td_V>o6$WfDV4FQPKWb<%mmJpy)U0SW}+G?YcvA2iv zKwM@ugG`GN3=NTy2a}+fY0VHfo0JV$ithSH)rW|cObc2Aq0fJCcccuG{bl3>^pq36DihQi&dEgOc z>LzMP-+T5Tuvx3|mb)J27&-Y2q(Gtc5J(Cvxm{1zliJ~AquU9r`I5iMYDRBEjt9ID za|RRQL^62mK$in{MdjB+)!yZfi}3hz_5qX>nOiIV8enhql!46=US^NJgy;qX$}tTv z?p&A9MGxT=_yJlW4R4U?OCn6T6HFd35?ws?Ei>>9;UP>Dm*(M#fK0*D-rA4+MKUH< z?+QZg_5)v@ew>K+N^A}l14GMU9F%mE(=go+-ie*Fd+ge`Z=;gpqN!0hz+Q&OJ_iy7 z_xWetfF42G7eBux8tjZ?#a3s{R*5)On%4@ml+uAPcwBAc^%vbn9tyy5G1C-|i=fHW zT@qn0bwjaT!OJykYp$|F@5r*%UWmegyaIp=plurWrn5k;0L)C_u*9$39a-V9`?8)1 z=}rMwlQDX5jeq@jm3T?^DXy)NJnIu3EGnlifaeyfDluw3<0~`3MlX^W)$(a))vKQT zt)p!T6=jGPy}PCMyDFJU-1VxnKMKTv!Uo(DTdrp?pPOv|{)xb~27cS-DyN(bbahkt zZ5z_>KF~f_QDY^P1R8w^)eB^V`Ty^uoc+5HU<kh2vN%?grjClR1;&eFB_g?&V>BgqDJIvlb0$ zbGN_V-iv-2E@jqy(9{!ri1M0pWFC|&JLgJU8+kg~(6X2A@fCtcuq&oae|Q4z-fNVW z)l2~|$}W)M!~w0&lH1ZNf#lNkMFaVHnPZm>D)tXH)8on}C=3_IR$dw3IpBq96mnz@ zeb;G*rkxJA@{fmAC!hk?T9Xznde@_;gw&si(Wz@oPM15%W8Iu6WXiMewojKkyOmF4 zBz0ZAuG*7uQGyHgf4A5S&P1-xE0))fguB;W46R*ie0QawlC9)IjnbZ7xzUfU?l^b( zX~i)hF=evB6|< zlv1w&TZ)b}BE@bYa zg}hlR0}!w5yM8}7`15NuB3YxBT8#{UilGD2)}*|scd~+^-fZK|F`wwsgP&6VVjC8D zwJS)=MI_?fQ=OromDF4UGGhEBM{(m@m3g!--JZ^s?tMf^)2?56$ zFVd1O9aRX8j`fzzNn4P491@}mVMAI|N;!)US;v29iTW*Bh-Nk=?5wQrc-&#CbR=BT zE_l8_FU(ZgAK6!1VsdTYquIXZ($U}W+UK_O%%P0R@rKRK#LadDG{*?-rmjdnDx-8| z^>A2fdIoCh@39^&?!FF+nHrrEKip_qYwViv-r7ttvfhr8>)tOnz3(qtE}GpLk0WhH z<*M@L>e9^NDNd4!>&)JQ$9q@m-csMLuvr_=EU!``n)n`ep8nO?&+mBbSZOGJ#7&+^ zu;x%yT1zVThLQH=LFAWT#9oQ?1Ai&3W>q!i*_|`ZAO$^hU{Z4gltjrpA}-eB}Rd8{EO{Iih}-<+n6&D-|vCkH$P zx|Rh99-G8O;azi~;k*5Nu=x$3x=w8DXQ=ByUkTN9K;BxIbV^f6>2bSo+QoRT4~a{y zt*s>=9$#pn0aOrTUZJXvs`zvV&J?^7I)z3LzwCOf`L_+6X~JDXa9m>S{eN{Bb~ZAm zmgRYHH`!_&007iq?!XGQr64GLhkVYdji1p?rjQ@52kx-4`S5lfAY^oD<0&S0j~@-> zpaz67ZIZzLsL|JibGF}O{=V_yDZ!w|ZSI6lK^MYeDgeYaelM?gE+!*`_st&**poCO zNI}_yt4HiN-Z!8~J@;0gGxA3jN=QPQpk4-E(C3_7YKhC%9|~5MX8))Rx3J1+fWP3& zBSKORH;~ss=_gLy0lIL;f+7NtZ3jxZ#BXHE|MfRN97>T3s8fE=}I9t&eEa;D5o$W-hVs6R}^fX5r5)C8V`q^g*Y5RrxJ zFSJOFc6q)Tn$jf#ws9(OiWt~u?d~=Lj@TIDJmSPX`0}R?Hl)%^9V2|1P# zoVv=Ys^Q9}&jL`}EEeFm$=J02Cpv0KYbfg1NIL0nt!4U98Z1r!O;o1O8yH?R% zY{v`^2G&Vv^2mb_(7t;84#d?JT*TPUzt&{9k6rKV{VzyH2pUUtn`1QO>@4SE3p%c4 zsL5^ESA+<)Ukh8+z}$>N>9h^J2tj*^5|1IN?~_wnA1x4EZ@oTVy0tU&CO@AHiPXU= zh4kSd4s&&LyM*h-KXv$}I)56x_{X#HFTIu}@n2ZCoSfW$OY@u76$qu6Gytl00Hk?? zMFRz(MlcE=1ExmqCoSMe$%ksky6h{$2&0F_r(dWy-FgR5AEB~>>LuMdBd21hg~e6MxxdBP}!&FlNwq&MuSi|98fr&oFdoR)Om~2?$HGR5Mc{#_qwvuXnr& z(fVzm5S68_7L`vz4&mAmQlJS*elkK2)~t}#1(+a4sb+u^M>s$PyR@aAB|99FWFPu8 z5=?FCKrM-)`CwSqYk-x58s!&(n``VQz-j@FVg#Niks-n;p}!y1k<#nz=F18IM7Ywu zesIy4kuz=w#J2N3IR9lX4+ep?GQIrAc=%$_W{2e)cKX;=@XJ^w3bFaw)&-GNIR8AR zyL1Ppp^}#K7Jn^Qah0*Oj%aj9w7`0~{L+V*&Fn0KfP+)x>OA17kvT;rUG%WLL^I9O zd3NT8G{i-ogc%lMpwf{?gApbogMkaz;BCz6TdG;#>rDaX)a*6*Jb(g)l5WJ|iTmzt za1-t`gAE*_5dZJM7Eu94KU`s|mCVL7-`YSf!Hdeg&naG%y(yp5lj>sz$AI z{q|7+2s9~7ZpBC4oIQN0`-DYBrw(>knn0X5z?FGNOA8M)n?TPkv+jqC8cjpPXy_o5 zL6h0Gq|eIMwjL5G>+TFe)Ewm6DS=RMXu5AJrDyTTC>RhLK#IxfHu@O8l;%)q_ngty z>??OOD$kQU=QILhS|BS9+|S&E*u#?B_P~h>1$Y_+1sb7XtHXC!l$U>WbH(n~t&f_TGuaA|-_ zGfO?94iB?f6n>Kv2zyOaiQYpO7V`;6iUV8);7eJ$^{H7Ga&$2$gD=+g!|q#H-BMq= zLC>t(`PF-=wTst?8MO916+S!?KNRgn8AJLh1AHCO7kvbpVK8$JNlrq^kkeF*nwpw( z2|p|}GyuTTK$Twc+mt99D1a$D^|&dV+mVN+l-f||w~s2^{CK$e_G9Gt7IXe2JtvLH zD$33_=^~-ot51M(Tgi1<-1NxQnW9OeKmeg8A_SCe{M4B^UwpatJpLRyoC#f0FaLP5EPrzCtPKJJ&m zoMBfClk!98yn~L7G+I3hfs3(d)$TMlPW+ma`>yp2=zS U46Y%A?ZV}VjoJyR|h@! z_&gA>2M+`15{}^vlw~X-C`gU0>O4H}FPDS98XyD;zy&kh?}2=$D-aWDYKa&hi zpaue}_qr*_AXJ79HGDqiG^Cb`Wv~_`gHrYzwxAm{r@`XdSh<9I`ARV0hj$`c)**D}zeK-ljiJ6e1om^i;3N`oNuS(1_j~mKtRjaq2D_|27*mMLL@)F z)uYOH$$7hF&%%jS@tB4~A0|_=J#x>~) zx9ck26Dac(he6O1QvE`ymS~A(U6fpM6{AbIR?8)+Y^ppW_wgo*z~Z#qI7Hb$m6aU5 z{jQWFL}Bk9ZoR<&>3hl*IdqkZePio}OEtpDKyk+fq0fLuJaQ)v41^ z?mzqOGIET8FCG#~fR~|?$gHU}c$KKdvLlVse$0EN8cHfhNqM%}5Ff=D_&J^-r=VqG z%|04<_|?XECnd^`imyTv)upa6nik8qTX|Ea(NbL7tZF9&+t)WW zA%F{L<*}`xLMNlu`K>@)zspQLc^F{MF2AgD9_*K)EOg0orh0m5`~yEBG>$-Vih&j4 z=44OO|6bp7VvKC$`qIc({-v|yXEd0e{}q2P9eq|`QnR}(0qT7J=mzSP&|Tz4{``p0 zw$u>8GX%~|w+S7c_f)#%HxmQfL=}*z2|nEZ)MYD@Q&gm-Gar=xQup4y^E~D8adBsO zWLy9U0i-$;Fpb1voz13!trV9}g%p^0wFUWPv!s7_vM2z<7}kG;R3yieqN!g}eL0Ia zrb+O(-&zhU-hPUrU%rg8#1mApZWjAp<+578@N55-m+D#ks2|*m!TKg!5x=fZHrf2> z>r;J1X<%xaqmf5+c3Fw#Y9DnjhFUzYi4pWdk!Igd_P?{e;xunI0)8Z zC)36ty3x#HRbc-iEQ!}Sp?yOMPCB1qTg%}FD@H%a$6j<89v>~eR#Ns zjrLCs*sm#Gb&8K!2*t_ zzX=_rB{Sde3&Z9H$S1)qTBD(?yhbbJbPj)W*&*}m`CsZsEg3zY>}POd-q|UE;XhHm zIZD0pheJ{Q-?HL-K-tgtG~AU1#x9EdL({bXK_%t>Rzr2gfC9hBo|#@zaHwVjssU~; zsJTs3M>!^Pg7h!)Yd~y`t2u#iA^Wqy5w)sjGr72U3tBb$vI0 z^6+uJ-hqt(^#$P`0sW-^R3@ry_i2;-*o-bIGV z@qE}a+gV(*G&hH=des{<1&1ewh+&XRjha7>iH*z5B*)bCTwZ?fu`)`T%JB!m&oD^k z6{0Hc6%O0>{x)e14UHy1qjI_(9G=gcnyY6wuJV)$f*cKnVCQowAdq=m8_RtL+v3my z&>zaHtDgXIDg0s`BIK&(=08K#;;ilMX)VV1^eYqWHW)##rU;~Zgmqot>k&VY#0Jwi zQX2zR+zAK@$e9Q#zNN?F{KJO-N&}gGg+!R9K5mDaNrXe^bK^!SAj!ggKq>CPjVxLWA8{l>po=>|DCWj%*Dk? z`?`6h@>>Cm4Z^F%!mr{YMN-ZSr#@wHCCz8PZbTjVq3Tge`Y8`?L0LQd6t(=ID4I+7 zsroO!b2TRCxvHx5YrU9Ll0Es^1ve|*B%NC|XFnS#>o0RV_ekfTGzHVIO=H_q4t%wH z0-#R_X5F7vzHztsXoOMMRiPsNYT=0Mb_W6WPD+gLj-QzF9}@*cnY|@zV&K}Yh`I*ii)y_qy_P-pt)Ao_b@EPl;0rNlOU;Qa+AI$ohy|5auxx`^k zQOj5neBDlemX)?pDV5g-*FE{+#_HD0j08>Q=}SHe%gF z8SCOWRS~a*h_~=6UV?2z5&rsljG2UJ4b6cJ* zdUtoZUFTkRq>y2@KKfIvxNoSqg`Kf4&AEO%x7B*7_T@cV!8@SfL|_W&Z$AMKI_eWa zb&E~6PVEKh!>Mj6H_jcek6eAi12`PaFlbZ*=%)$28u%PXBaY3ljVmVbr_PMz^2poN z`V+4?*;VpjPA&=0J9#zu_^e)~#f-~%I~>7ed>eafHMbeW%Y4CNC-B|QGM@Q*^5xy5 zv8j|hm#IICy6Ud)vrb`iSoet?s@G{@igViEaU*`jZ_I~bGdKM}wJ&MlfjjT8a^hUL z>-7yBuCKsQ*~LY0^ufZ#0Vbu&Ksokqj&BUjr!CaNmkL%JP91!B;_vG4toWKq+64K0 zhk?dk+DG!r0@Lq2IK4K-n-bwF32pBBzt4rA4J%5D)PlEyS?z^TnLz zSe@W;dQQt#Y^UgxD~<2_hZ^OVo=^GDE#CYY#BkMM2`y_jOP`%KM7{tJTsao3Ko$b9u`F1z?i zsMS~FF8V8Qzdjn^EWQ#F5*Am2^&fV4@DDPN6pLfPr-Z_D7k#?G$5Cfdf+t%?o}N1$ zn>s-rr@l6NysCUsLVNE*Rzuzd?$~itT-k1AUi$27581Ex-hJm5x#0QH5_789;(R!D z-GKki?>tsLG^wYo)%hsp5Uu|CbecwLCm zJMsZ8Q9SU=S?fQ|N1FDT`&NgR{V=lF{S-_=8eHXL@gU1A35yOko(5h*SFu; zx*JXzuXups1qq;-)8_nPN>l5>58xSfY!eOmX(?Vf?MkF}am*Ym-A0NC!5i4oFD?l(7HZ!*2>#jkIdcYvwdrv%-8_a=9be)fd&i~-kY#R?qd7BNut!tc@WN6z z{qVu+m%E-lj(o*EH(qQsGQ8PZ_#im()t`ScBii)2#o%2I0snVVtq)FOsNrRdCBIFWuvkBK9e&QPay76w02)l6x7sem< z%7ROC>Gpu>`qn^e|K!HF;>vZimoJU}Uuj<+&DHz1yGf-|nUbLpB_w5*S$#-~G80MW zp+X56B2vbphzt=TA^J!vLzzQnlFWP~rI13I&b9mg-nHI$t#j5oXC2EQ#mC;y9-im9 z?(4el`*yY9{vtL&EPu&HL&ZX$^r7j3*r#7_@3UNp?dzk?QYa%4s#b{*g)nu=x=d~! z9%-N+2$zWuVm5kY86p!pLe_Pdj?q}~kfcFY(z?a;vC&h?a{F)N0J5W8D(azYKFt)q zD6v#(5>2>KR!e;uN zku~K|W{I3%fFPyeu%g-#)qg9XfiHjXqb#h^U#D>v!KBwd!T1_Rm@E-_zZf@4T$QZJcX}e8JS>VlJ#Z_}+A56%tR& zk%q{(<|^WP=-`;0`NkqHFU`p1qSGUgSmt8#M8Ef>B33K1#l}Lf zw4`JTnjB%#MBiuZi3a^^=tdLQPG)_*@DjA>kc;dTm=vJ|7M*SXF1B$}PIsxYe8_>y zz2A=(Ot|gn0DPwKBxx(bVKHY&exly=OhNYvG@dOr-wgi`9vtowEwO2vERojws3gd~ z-eq<`e=mz?$=I4Jk6s&@DD3cYU21Yye<~uvEMvA+#$xksPp))p_EfgR4v6vuOl9r& zv**L)_1SG`Y=W<)XpM7>tt6+8k5v4 z8y5%)9Fu5WF(2~xlgUm}=+b9V)H8J9C3^MIXSQ>=zDTeaKH;Y{0`I~4V>wkpBCS9B zsxOxB7Iyt-w6M$k5Vni6#+PB|{`~!W4eFi-et*GlHsC192?CuUiwYYUvgm}fOV-%q zGf3J{^YD3e@lH#0@ZOBqRr(PT+qvUERvtJm8u?m#coE&8*05_46%N&stk`o1=6 zm@_0UZP%VsiU?77YL_a`&fp@ut?j)n{X3=sJX;jY!m~Zhz+D+)caoVYNJh;26kmKt zM`H8*RXgG(N7UF#-?`(T+oEFdXgx=IT5{e4Bf6x{tszmKi7lsebMa;lAUUDv@bIwm@>)4+7NV#m-b#B>;JFxlp72Ra`P!@g^Mh1!QGyhzHWsA}6t|t{{ z-DPGcsf%9~HCC!K?$xt=(hEyB`^< z#0z30%W2I>^D}z&O$xe;~Y_s0y$rQ4Wl0v zeK#Ljwn;R00~Qk$KFiP9M34cDNq#r9+`rx0PZ%23kLIP_ejy#{ z4zJjMEq>0bl5ykOU|Lpw@8OJo-5RA|YP+u_#$5fiX2nIcx(S1MXeSq4y} zZFyI+WKMV9S^p?JP}%doxFEw8fnCaMv;~f%j+%q z;{W;k-fWxXP?=TCdYsrX%=N1Am{q{F2I-pu4T9@q&spD%vUk+RPvWn7&i_MqzQ2zi+>7n8@3T@yi(v>o<2Bz67P zLx25jZI$?(AL8x#HvKtbb{uM;*U+Qwoj9}BnUKk$5W#VuK@OGkHEuSp2VH2wy_idr zZOzT^Q-gyuBGWf;`YQ1pzqUhjzn?{B)6~;=8X+d8YSHlkbw16US*}A<8~Y3~4$o#n z+PYDnjfJUImRu^VQb(tr7C1A0>~MITWTU8q?ebt+E5y5$moH6Bw#Rfj#(cl!hPGkY zr9s0oa^E`qo1QO~k%2SB{+QDE%Mc#Uj&+<%3lcwA*`+*7%|Co|Kl}07x-hoA2g%@I z9a9uTpgJEXi+e`u z+q9hUPV(l}atjqGanz?LF3uY_`bYnR5~dr%s%go3r<|P?>_t5hkly$~3_c$e z0MtQY3A;6S)PMerHG1IiDi~lY6p-uf;xu_k^l4LKEyvX)zoted^M`yyWd`QlhLdn>tA#OZkJ-&?73asGp!+n|;$ zXKgEVeGy;Z@wQzieHtVaM;4AEtFE?>U*n}`q@&T!WI=gS6^qZs&z|YK%0wk%F}*L@ zyTNEV(B+eI@bTq&c@^Hu7Z@6G!pBeG4|hWi_yre)%V(2hHV{*$&6i4gy4MmP23)lq zSGq3R)qi~LAW4EJt1=perB4Rqc6x4r7C$KLsa1FyG!;;JjYU&0XnLKYsfFW6t-yUNS{yoQ)J=f$ov@e5v_tcIiTZykk!m0?U;p zdb4wA37-;cQM2EhEu9Aye&LDH5Y09=!YBWLK!}P;Rb9#V2X>6^WTheBg)LHJ5?5M$ zPZbvA2tCD0+4BDVbp)8g*}=@C^@IBfmDu>yob@|4>5#-o#o7quM_gA96Iv6;x*dNW zn(%RkqsN&c%l)pDn#_A41-vJ4s;Hw?G{T3ek4||)35*f-qnDS@e2Vfs-)G3C6F|Q4 zFR~Pqb#m?Wtjq1pYUL9r+>3Q|bpwDQQQVO1!~Ll~20$8;;Pw*p^UeoDm>Nz{&v^Mf z(6S9xH=!_9&Fwx)N&n$fvsq!)<8MnEQAr3({&*qt2+t_E=Nl@hfC-=>Fd3obK+#1c z&!3wjz?qvPxsw0P156uH$Q0!r(Jr@_f-|4R3 zQP$K}=2gYlQzFU&xBuFh;%BV#-p$YDrDeuzKd&*(`IKoJ51BtVRwJUdb+d+njZ#|t#a%*FP}2^2=Mx&)T(>;aoe~&?Tebppa%F!{f4``&Vtl)8@RdLQ zE7c>2>q&{ly z@4Tn7Z+x$9TKy_=)|qpoG0TULp_ye>lKBM2O7SUUe1 z;VJnds=e+^KPMaYxH)r4(e)UEby2=QZo9Y+A{rV;1u}YK_*H<3$lb_4QKCuOyU8-I zv*E_wXP-xo2_>Vq?OtEykI9ET4lDQ?E1YwD3whXNP`VAX&^z;HZ zY!xNbMe7B`4!8^w2==W_4%1O15xTr^8KOmg4*i`vI$|CYxrd27U%Y~3055%gMb=qJ zm1rSdixEc~P}XHSyrMtz>7dPYdf|-I%KnTC?1D`F6Q9ewWNtj&=fc3w)aN_>^UU{m`>UOf28ap%Ea`99PrGH~yZA-2m?eiHtpAAv?jU68@+Qc+BuVn)f9%u7MB) zNv-Y+1;2qn-8Kh)5>zBX4Du`w@dTV?)=?numa$`#-C}H$(%HB1St0_rVL~F_fK)pq z9$M@%sg4W6eX;&T-@A3&{?MxD2vV?D)I0KplH1Vw9Zd)Ht(4M$^T^x7;N9MVjFL^L zWjOTa)upc}Euy2Ze+{;z7k%e#F4mz1X0#e}K{o?|q7LLUWgrJYj}^3Y_Jsb$0+)C> zK&=*}jRL}<63JI`CxJ>vqJLdo+FP}0CFdw;s)<+96}ky!M@R3WX_BH$jCmj;uwDnz zJI+NP;Lm^fCRuSFkca@)5oCA&vijwQ;NgwpAsXj3*G8bgr;A(?b%X!UOWxk{=pHvT z^EgTjRCq1i zWVaFiFmDx;_oF;|6N!|`A|}N+)Mk|5XTj!uTeFAv>xUOY7{I( z@jrsNQ^)H5fPqO(=3!t>Nr^(v>DT5aB6m(QNG6?3wsMbei^YjVjseNv|6eYg|skCe_R0>d#%XQ&FkWF z|2_<~yNND0f79qMFYp>RbI_zm$i{XASs!G*qZVblg+72+a6=|Dj`HwmtZ>*a_>9Oh z7}Pj>{r{zWTCJ$)Q72oVU>nJa7@y21ljd&%BvLiZJ5<4Xj(;}_7XD; za6q0_XcRz~*9A1r$94C4_t~~95v+*D>FrjuOu8?w^aWug&vQifjoi=o4agYhyB4z{ zrdl#+r&7%RasU`G+Q&^8_BUt=+M#FOLSFXNGy zOpY6|9evLw?lK>Bpx)B|UQK!=t13SMm#w~Zk$}8B7kouvLt_g0HLnHj9e8Va#I9L` z44(Dx?T(|bZ{E|Ep}c$85T+Gs zoPey|K+3b=eTg-@Eo(_C%$w0A}YRJ5)MC)CZR_%`|GDRHgj%V>$<&8p|KxOO4_10q?z z<)qWjZ{G;#u1prndJS^D+Ixzg?q>PG@v7D>oTVWPdBy>`Ckr*>d*U!JkV7e$z_{z! zRBug)T&&oK0B_xBwGVb3x5h6z>19T+3y#@jJaDb^h_05byOz)+d-r9y+gwcZ=HwII zi+%DmDbkltR=KA(4wO1qDftT|r~i2^RQCIBY^+3s#$%R(2o}W#dg3z2kKduDqm7zx zBX9qBnL5FdCA;YI{@TjL2Ok6rLhC0TF06|^$Iv9cCRXj_e5O{?-AJWt0(aBDJcW3- z?-a$+6j$@uMB<>_K#1VJnWoUujV2Olwv3ir7R{ohFLCypNXic@OH2Nyv*WsFlgJtE ztKxE3M^g8Blu1pNkONyqu#XZ?LAd1k3N41@M^$x`cuDT$H|Ll!q;0RP;EexJ71LoR zrg$lMPgd8he9Mo(5J#Kc-mcIXqyH&s69Jk8@3<#nXKpc@+9ax%?)bz+6^7*U=cmKOz-u0|ZyWM-)6U+F z{;t{b0_{tY!DoTDkbyNMmY^cep*6?24Q#vQPEQzCUkj(mvbwY71yF zC@9yca^vCNEiYg-hVOEu>-BOWq`=@{fvmb`Q~S)D6=oQaP7(Q&qtM~%Z-><@xK<3Q%$LlJG^5jln2_VFo{D-r{nMWNi$w6ohyS|mZ3*ci02 zcctjSExpZe8!BMwRP6NLCh?@Q9iO8}9sn$Yu`~4%c3|?C`oLZ|x}Y#(nw|f_T__7@ zsKv=L?(rxu?*=T)Xsx3EzPFbFo&oS14Ii1AwA{!)(;uAnBnv)QAigHK@Ol)n0&^dX zWPOtCao^0;N+F{cjn(@~e=-B|Tv?t;h85-Qm#P^j8MN%aRm-2v&lgXQr8Sy*u1A_a zz{O}z7cO=(nGy&H5G1-27$(2_EO^Oz_|>v<1o+}3$-(dXn~2=)S9$qUj1fWPmCkY& zfcz370cY@kRO`JWr~-EwR73|a$+SXyf(NUu6k0U?2?oeOdM)kly$E9Af=|N-plYK2 zHG0=YvE2qg<#6T*@~eJ!1Mop@#?@M$q;K!-4M$I*al5J?#nnH4GkQMnJC7F1(^V|0 z$0O0E9%Ur4NxhS1ZvBThJk534mBN|{TAjw9P{CESb=5ami?7qp)Vo~V(%HEyu5mPS zw!Xo(K*)Lf^jPS(y=-BG3idNrv^2Q8$; zNAKAUy2wJYM3fzfUyt_#Hi09dxoUU#) zRxmX+jp?LNSi;0Z_1k4vWcfn%T<$jEMj#HW7dWV;715EAk)bZ^A(y0k`O>Ar32q&i z1l%=~Q!b-TpDqB)T4w0{E(laZkwP=kECsU+J$h@+YK9D|mPm45R@Mq_!YY&oCJPN0%wK=;`-LHdb77vd1EKHqhuCPn@`s*f751(AhC zIV0+a8TlKxGVJ#39~@X&8ff?e&`CpPXkS%P7wcWDiBcLsj}l?u?G2ObiGMK(c&2(` zzSMO=h|vC?uJ!a{^=X3Gw&{kxAuuo|GZfjih2f{KTq&apUaPZS&((7CAGD_=h_Gt$ z;nG`*vvEIvVN^Dkf`9x5fV}!5SipAx!%7x-0y?O0L}23dzn$ zoWV@#%Ff~Ai$6hP@L+q@=2e>&czyeNdiG>z=jO&dueqn_Ep7P3ECfLe*u=wk?y+ zro&Z=?6sN& z8_AxLRqE)jdj}SMVk)=^ZqfaE0iQ+=< z=x)Y4Oc3~4oTB&hX8~SFz=X#OYe#fIgA12k_BrJ9W5rS7HAZipq-UMl=H}c8BmL=L zpB6#1m988(SbZ>x&e>>Z)61RzF2Xv-$iGs-Y;^mxfp%sKI&ZYJv@nV@Bq6Z8QAv}~ zD3FkB2aUF%n~c>Eaye+2HogqvC#ew>+c~I|shxS~%$YM)1N#guXMar%Rn|Xdw7lDD zEgqD#N|PcPWly(a5;_7s;P2kl`T{Yv8ivIf*CX4;=m8r6wD-r^+5Ft48Wfhp)LOB3 zR=%a`0I~#L=c2BF801NgQ?L=7+685(>W@T;7BqAV@Bqb$5 z-$@lh`(8zy2vhv3bmI_`~7G-GwQ&uN8XPy<8QYhl^w0ljB#%mov@bap1h|f=cEnnPZ z2;{X`45I%uWP3sud3U^r93otpFnL6N{5JXG`s9I?wfIDfJ1bu+MWoJu-5BkhL!o@RFu+~`TNKPrjHKVFj>m5;J)M88&#Jc? ztt7(=V^S^jmzFtW>G$?as-*X)?c%J5S!LOqIX^{@|E|xJ{7)H%n~$>?I!vg#mz6JW z&!pd#1H)XOH@|+t=gl@?91>hpyY`m#k<-f2N|-wd*~>#?yMJ+fc&W0xbJd&LqoaI# z_O9MwcP#dLmJ{u_F8lWB+0<_8EUc6_wvP3$CFak0yw159Tw~j|CL*fe_?Fk&beWdn z1qmDH6ln0Y#Nzvlr@r{yUW(`(nI^daz_+%=SmZ^WMS~>>i`{0!uqlB-`b+u+La%^v z|0cXXyJTG79{2#|4Yr6?MEzF#m`?cCxZ%@TN65B%vk+5ecUQb0mZ~elCr8>_VS1u^ zwOuIgqh2veYklKbIE=pN^%>$5c|tk@=URg4opbVhziuYB+NBT4_<6q8zol}Rs-CcE zBwRo7I~U#UMRJJvv(#tLSkX}$-Xqd}B)bijW;Ub!1?yA2rzdmCj6r}YB(=lh=HMWd zW9U@n9UL58_TtLXKPM+@4>Oktsx$Can#fkJh={#OoItAn8R-O6TP00B*e$`Z!MmZA zAdRkC!p>3ouQMq3o~c^BkTE1=J^FYUmhTOj+iYQE|3ZpzxVXqF=Nw`R>uULs3z@G* zO7gH$lCn}~Cmi@xtE?PhRR)J^?hcOXzG9EG+}yoqUo&6~DtglP;WmN?$kPLFC_wWcqHMER)=C3&Z>26-L*Okn3V|vgfvE?n%9+vMxp>8{I=`6PwVkjZpp-xP z29e%G(kf|WbOQi0?wA0G@~YBDK}%BAPW7uJEG@bipS61Ar!BxIl4ROX~}d^fL08Q zPyyF}|MDour@j1^ep=J<7gR@%^Gtc{-0kIjz2s_{=YtI}nesuDe!cG5(Hfce8f4Wo zM6nKP+_cUGqykqcfB8rH#DWEn?@uPe0`Q%T6@p0*5geDsVl&(?oEb>9%R1v&CB$a! zy-B;b_w#;q>KM^VQ-%AI6mtM~@vQ>Ql4ng5LWfG6odr^*-|#M0e<+FT7; z10qctl~~TBJ5_k)1Mb>#rCTbM?WPSfL835(6zDABukC07IKcwxW7L$I@6JpGW`c$232R485q_re$mX!A1xTzz_H@0 z^JcR5V2z&{t6}0eoI*^ZJt7XX@2 zm=*S{e?3{|y!J7tAjBYaC(wpQ{o@l=pNm_@6y*MzIL%F2zLOTX?7oO;yU9FuE zlI~L`o#D?!ASEqEjXBpa4Ar~$gjw0{XLmt~O0)n!QmfPb%^i_!C05jNx9fca4Ho-T zK>kT5Yy>v|KPTc7^!|Xa^2}QBQ30SX|Js5Ya%8NlSoa_NwEPM$2Yey;1Fur^FL`Jg z2B9B^56W*3CtqT*_D8q>W5?2P&a=Agw2LqG^Is1a-)xs;jF`QC-NOAV&9UU_*_Lv6S%FMoes?DAPNkIMdaBZutb+if<}!5>by>$7>k z3LBic!;f`Z^;k{GR$2p@N9C zI`Pne0rTJQH6d2kpB=HMhLdA5z>rf1@n86_28)c!3K6T^74MOa7fB0VlF~AGC;9l; zyT1-DxIsugXy7DBGhSP++S@j)X^oe6&@6u VkRkFO9wzX0L`|FW Date: Thu, 5 Feb 2026 12:26:03 -0500 Subject: [PATCH 2/9] formal api model: minor fixes fix some formatting and grammar errors --- content/api_mem_model.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 3c79acdae..dfbb48fa9 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -11,7 +11,7 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}) perform both a read and write to memory, and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) @@ -114,23 +114,23 @@ \subsubsubsection{Remote Delivery Order (rdo)} Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. \end{itemize} -\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. +\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. \subsubsubsection{Remote Completion Order (rco)} Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM_CTX_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. \end{itemize} -\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_with\_signal. +\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. \subsubsubsection{API Synchronizes-With (asw)} @@ -185,7 +185,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf;(sb^amo\_op)^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. + \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. From d48a22d9d1dda237de02f118891f86c5bb51c4f4 Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 12 Feb 2026 16:10:14 -0600 Subject: [PATCH 3/9] formal api model: update data race definition define data race to include racy API observable accesses and normal PL-level accesses (including atomics). If these race, behavior is undefined. --- content/api_mem_model.tex | 12 ++++++------ content/api_observable.tex | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index dfbb48fa9..3b9384c8d 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -132,16 +132,16 @@ \subsubsubsection{Remote Completion Order (rco)} \textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. -\subsubsubsection{API Synchronizes-With (asw)} +\subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_asw} API synchronizes-with is the API-level analog of the PL-level \textit{sw} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \textit{rf} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: \begin{itemize} - \item $A$ is an observable read or read-modify-write access issued by an \openshmem operation from any of the following categories: - atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. - \item $B$ is an observable write or read-modify-write access issued by an \openshmem operation from any of the following categories: - atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. + \item $A$ is an observable read or read-modify-write access issued by a synchronizing \openshmem operation. + \item $B$ is an observable write or read-modify-write access issued by a synchronizing \openshmem operation. \end{itemize} +In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routings, or distributed locking routines. + \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \begin{figure}[h] @@ -176,7 +176,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \begin{itemize} \item $A$ and $B$ access the same address on the same PE. \item At least one access is a write. - \item At least one access is not atomic. + \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. \end{itemize} diff --git a/content/api_observable.tex b/content/api_observable.tex index 77ec917d9..9e4f4ff28 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -257,7 +257,7 @@ ST R1 -> dest[i], p \end{verbatim} \tabularnewline\hline -\multirow{15}{*}{Point-to-Point \\ Synchronization Routines} +\multirow{15}{*}{\shortstack[l]{Point-to-Point\\ Synchronization Routines}} & \begin{verbatim}shmem_wait_until(ivar, cmp, cmp_value): R1 = LD ivar, my_pe while !cmp.test(R1, cmp_value): From 27a272fe4326aed74de1ee04dc78c1a1473b9adb Mon Sep 17 00:00:00 2001 From: John Alsop Date: Fri, 13 Feb 2026 12:13:30 -0600 Subject: [PATCH 4/9] formal api model: minor fixes and clarifications fix a few instances of errors identified by reviewers --- content/api_mem_model.tex | 14 +++++++------- content/api_observable.tex | 32 ++++++++++++++++---------------- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 3b9384c8d..2dc609f1b 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -75,7 +75,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is dependent on all prior observable accesses. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). @@ -114,7 +114,7 @@ \subsubsubsection{Remote Delivery Order (rdo)} Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. \end{itemize} @@ -124,8 +124,8 @@ \subsubsubsection{Remote Completion Order (rco)} Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $OPENSHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. + \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. \end{itemize} @@ -182,9 +182,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} - \item $acyclic(api\_hb)$: \textit{hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $acyclic(api\_hb)$: \textit{api_hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{api_hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} diff --git a/content/api_observable.tex b/content/api_observable.tex index 9e4f4ff28..b9867c95a 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -31,7 +31,7 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_g(dest, value, pe): - return LD value -> dest, pe + return LD value \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): @@ -63,67 +63,67 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_inc(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_inc(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_add(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_add(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_and(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_and(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_or(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_or(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest, pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_xor(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest, pe return R1 \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_xor(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest, pe \end{verbatim} @@ -149,35 +149,35 @@ \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_inc_nbi(dest, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + 1 ST R2 -> dest, pe ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_add_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 + value ST R2 -> dest, pe ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_and_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 & value ST R2 -> dest ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_or_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 | value ST R2 -> dest ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_atomic_fetch_xor_nbi(dest, value, pe): - R1 = 1 + LD dest, pe + R1 = LD dest, pe R2 = R1 ^ value ST R2 -> dest ST R1 -> fetch, my_pe From b8b423318baeb7a29f86872474d4ce4135822182 Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 19 Feb 2026 10:10:00 -0600 Subject: [PATCH 5/9] formal api model: minor fixes and edits for clarity -text edits for clarity, consistency with the rest of the spec and the c++ spec (e.g., address->memory location, brief prose description of acquire/release semantics...) -fix unescaped underscore bug --- content/api_mem_model.tex | 44 +++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 2dc609f1b..e7c0bd7f4 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,30 +1,28 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -Although \openshmem may be implemented in multiple programming languages, this section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality, and because it forms the basis for common GPU programming languages (CUDA, hip). We describe how the guarantees described here can be adapted for GPU memory models in Section~\ref{subsubsec:api_scoped}. +This section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} describes the baseline C++ memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} -A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial memory value at that address (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). - - +A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial value at that memory location (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed (we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) - \item \textbf{Modification order (\textit{mo}):} A total ordering between all writes to the same address. - \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same address such that \textit{R} returns the value written by \textit{W}. + \item \textbf{Modification order (\textit{mo}):} A total ordering between all modifying accesses (writes or read-modify-writes) to the same memory location. + \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same memory location such that \textit{R} returns the value written by \textit{W}. \item \textbf{Synchronizes-with (\textit{sw}):} Same as \textit{rf}, but also requires that \textit{W} and \textit{R} are atomic, \textit{W} has release or acquire-release memory order, and \textit{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). \item \textbf{Happens-before (\textit{hb}):} Transitive closure of \textit{sw} and \textit{sb}. \end{itemize} Furthermore, a \textbf{data race} is defined as a relation between two accesses $A$ and $B$ where all of the following are true: \begin{itemize} - \item $A$ and $B$ access the same address. - \item At least one access is a write. + \item $A$ and $B$ access the same memory location. + \item At least one access is a modifying access (write or read-modify-write). \item At least one access is not atomic. \item $A$ and $B$ are not related by \textit{hb}. \end{itemize} @@ -50,7 +48,7 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: \begin{enumerate} - \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new address property that specifies the target PE. + \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new memory location property that specifies the target PE. \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \textit{sb}). \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. @@ -63,15 +61,17 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} -When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory address is extended to include both the PE-local address and the PE ID. Normal memory accesses can only access addresses on the PE local to the thread that issued it, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same address if both the address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). +When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the PE-local memory address and the PE ID. Normal memory accesses can only access memory locations on the local PE, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \textit{sb} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \textit{mo}, \textit{rf}, and \textit{sw}. \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} -For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be "issued" by the associated. These accesses must occur as part of the operation, however there is no guarantee that they will be executed by the initiating thread, or that they will be the only accesses performed by the API operation implementation. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses and the initiating thread of an API operation. Accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other (non-observable) accesses. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses of an API operation event and any other event issued by the same thread, except as defined later in Section~\ref{subsubsec:api_relations}. +Note, however, that the operation event that issues these observable accesses \textit{does} form an \textit{sb} relationship with other events issued by the initiating thread. +In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the address as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which addresses are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. @@ -122,7 +122,7 @@ \subsubsubsection{Remote Delivery Order (rdo)} \subsubsubsection{Remote Completion Order (rco)} -Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rdo$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: +Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rco$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. @@ -145,15 +145,15 @@ \subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \begin{figure}[h] -\includegraphics[width=0.75\textwidth]{figures/api_relations.png} +\includegraphics[width=0.75\textwidth]{figures/api_relations_v2.png} \centering -\caption{Example \openshmem code execution depicting observable accesses, API-level relations, and fused collective.} +\caption{Example \openshmem code execution depicting observable accesses, example API-level relations, and a fused barrier collective operation event.} \label{fig:api_execution} \end{figure} Figure~\ref{fig:api_execution} illustrates how the above concepts are applied in an example \openshmem code execution. \openshmem operations are denoted by rounded boxes, and their observable accesses are shown in dotted square boxes. Matched barrier calls from each PE are fused into a single operation event. -Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations (in addition to \textit{sb} orderings within the operation): +Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations, described below (in addition to \textit{sb} orderings within the operation): \begin{itemize} \item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. \item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. @@ -161,7 +161,7 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. \item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. \end{itemize} - +Note that~\ref{fig:api_execution} depicts only a representative sample of API relations, since an exhaustive depiction would be cluttered. \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} @@ -174,7 +174,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: \begin{itemize} - \item $A$ and $B$ access the same address on the same PE. + \item $A$ and $B$ access the same memory location on the same PE. \item At least one access is a write. \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. @@ -182,9 +182,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} - \item $acyclic(api\_hb)$: \textit{api_hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{api_hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). + \item $acyclic(api\_hb)$: \textit{api\_hb} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textbf{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} From 3aadcbd3578706e157161ececa20e26a7c65ab8a Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 26 Feb 2026 18:00:17 -0600 Subject: [PATCH 6/9] formal api model: C++ -> C11 Use C11 as the PL-level basis for the formal API-level model. This is consistent with the rest of the document, and differs from the C++ model in no relevant ways. --- content/api_mem_model.tex | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index e7c0bd7f4..11b6d8880 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,15 +1,15 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -This section describes how \openshmem extends the C++ memory consistency model. C++ is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} describes the baseline C++ memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. +This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. -\subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:api_pl_baseline} +\subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:api_pl_baseline} A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial value at that memory location (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). -The C++ memory model is defined axiomatically in the C++ specification. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. +The C11 memory model is defined axiomatically in the C11 specification \S5.1.2.4\footnote{Formally, the \Cstd[11] specification is ISO/IEC 9899:2011(E).}. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C++ memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{we leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C11 memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{We leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) @@ -27,7 +27,7 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a \item $A$ and $B$ are not related by \textit{hb}. \end{itemize} -Although the axiomatic model places no requirements on when each event must physically update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C++ constrains candidate execution with the following axioms\footnote{When using SC memory order, there is an additional total order over all SC atomics that must similarly be consistent with happens-before}: +Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms\footnote{Instead of the verbose prose axioms defined in the C11 specification, here we use the more concise (and equivalent) representation of the C11 axioms outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).}: \textbf{Axioms} @@ -39,11 +39,11 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C++)}\label{subsubsec:a \end{itemize} -Finally, the C++ specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. +Finally, the C11 specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} -The \openshmem API-level memory model extends PL-level memory models like C++ to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. +The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: @@ -57,7 +57,7 @@ \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overvi Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple PE memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across PEs even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. -After defining how \openshmem extends the C++ memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. +After defining how \openshmem extends the C11 memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} @@ -71,7 +71,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable Note, however, that the operation event that issues these observable accesses \textit{does} form an \textit{sb} relationship with other events issued by the initiating thread. In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C++ here because it allows to clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C11 here because it allows us to more clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. @@ -81,7 +81,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \input{content/api_observable} -\subsubsection{Fused Collectives}\label{subsubsec:api_integrate} +\subsubsection{Fused Collectives}\label{subsubsec:api_fused} \openshmem defines multiple collective operations that synchronize across multiple PEs in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other PE in the team. @@ -165,8 +165,8 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} -We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C++ model described in Section~\ref{subsubsec:api_pl_baseline}. -We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C++ \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: +We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C11 model summarized in Section~\ref{subsubsec:api_pl_baseline}. +We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C11 \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item $api\_hb := (hb \cup ilv \cup lco \cup rdo \cup rco \cup asw)^+$ @@ -188,11 +188,11 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} -Finally, similar to the C++ memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. +Finally, similar to the C11 memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_scoped} -Memory models for accelerators like GPUs often extend CPU-oriented memory models like C++ with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. +Memory models for accelerators like GPUs often extend CPU-oriented memory models like C11 with the notion of synchronization scopes. Scopes are necessary to enable efficient synchronization in a system with software-managed cache coherence. A scope is a property assigned to each atomic access that corresponds to a grouping of threads in a program execution. The threads in a grouping may share a cache in the cache hierarchy, such that communication between those threads can occur more efficiently at that cache level. When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. From d00c0f02e36535586a570aaf4f05cb6f20ef710f Mon Sep 17 00:00:00 2001 From: John Alsop Date: Fri, 27 Feb 2026 11:20:17 -0600 Subject: [PATCH 7/9] formal api memory model: minor fixes, formatting updates Minor fixes to observable accesses, grammar, and formatting for consistency. --- content/api_mem_model.tex | 28 +++++++++++++--------------- content/api_observable.tex | 8 ++++---- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 11b6d8880..7f3d89f0a 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,7 +1,7 @@ To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. +This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (for example, CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:api_pl_baseline} @@ -29,13 +29,11 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:a Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms\footnote{Instead of the verbose prose axioms defined in the C11 specification, here we use the more concise (and equivalent) representation of the C11 axioms outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).}: -\textbf{Axioms} - \begin{itemize} \item $acyclic(hb)$: \textit{hb} cannot have any cycles. - \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf \cup (mo;mo;rf^-1) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. + \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textit{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \textit{mo}) written before the write of the read-modify-write access. \end{itemize} @@ -75,7 +73,7 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig_addr$ is ordered after all prior observable accesses to source and dest. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). @@ -140,7 +138,7 @@ \subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_ \item $B$ is an observable write or read-modify-write access issued by a synchronizing \openshmem operation. \end{itemize} -In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routings, or distributed locking routines. +In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} @@ -155,9 +153,9 @@ \subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} Matched barrier calls from each PE are fused into a single operation event. Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations, described below (in addition to \textit{sb} orderings within the operation): \begin{itemize} -\item \textit{ilv} orders the store of $a$ in PE0 (and all prior hb-ordered accesses) with the load of $a$ from the put operation. +\item \textit{ilv} orders the store of $a$ in PE0 (and all prior \textit{hb}-ordered accesses) with the load of $a$ from the put operation. \item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. -\item \textit{rdo} orders the bulk store of a from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. +\item \textit{rdo} orders the bulk store of $a$ from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. \item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. \item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. \end{itemize} @@ -175,7 +173,7 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: \begin{itemize} \item $A$ and $B$ access the same memory location on the same PE. - \item At least one access is a write. + \item At least one access is a modifying access (write or read-modify-write). \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. \item $A$ and $B$ are not related by \textit{api\_hb}. \end{itemize} @@ -183,9 +181,9 @@ \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubse The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: \begin{itemize} \item $acyclic(api\_hb)$: \textit{api\_hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textbf{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in mo order (often called coherence requirement). - \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^-1 \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. + \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{rf} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). + \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^{-1} \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. \end{itemize} Finally, similar to the C11 memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. @@ -196,5 +194,5 @@ \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_sco When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. -Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. +Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of the \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. \ No newline at end of file diff --git a/content/api_observable.tex b/content/api_observable.tex index b9867c95a..91cfd3661 100644 --- a/content/api_observable.tex +++ b/content/api_observable.tex @@ -30,8 +30,8 @@ ST R1 -> dest[i], my_pe \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_g(dest, value, pe): - return LD value + & \begin{verbatim}shmem_g(source, pe): + return LD source \end{verbatim} \tabularnewline\cline{2-2} & \begin{verbatim}shmem_iget(dest, source, dst, sst, nelems, pe): @@ -56,7 +56,7 @@ return R1 \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_atomic_swap(dest, cond, value, pe): + & \begin{verbatim}shmem_atomic_swap(dest, value, pe): R1 = LD dest, pe ST value -> dest, pe return R1 @@ -141,7 +141,7 @@ ST R1 -> fetch, my_pe \end{verbatim} \tabularnewline\cline{2-2} - & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, cond, value, + & \begin{verbatim}shmem_atomic_swap_nbi(fetch, dest, value, pe): R1 = LD dest, pe ST value -> dest, pe From f5d6c4db077129bd2025d7900022188b0fc27c26 Mon Sep 17 00:00:00 2001 From: John Alsop Date: Thu, 16 Apr 2026 18:41:54 -0500 Subject: [PATCH 8/9] formal api model: formatting improvements for consistency -use LATEX macros where appropriate for functions, events, and relations, consistent with the rest of the document. -in the macros for custom relations, expand these to use the full relation name whenever discussed in text (use italics to set apart from normal text) -Eliminate the use of "we" terminology in favor of impersonal phrasing -instead of footnotes, use apinotes remove subsubsubsections (api relations): replace with a bolded paragraph heading -program's behavior is undefined -> the behavior is undefined (for consistency with rest of doc) -recreate api_relations image in draw.io and include svg and pdf in figures --- content/api_mem_model.tex | 164 +++++++++++++++++++------------------- figures/api_relations.pdf | Bin 0 -> 39660 bytes figures/api_relations.svg | 4 + utils/defs.tex | 16 ++++ 4 files changed, 103 insertions(+), 81 deletions(-) create mode 100644 figures/api_relations.pdf create mode 100644 figures/api_relations.svg diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index 7f3d89f0a..e31a9130e 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -1,22 +1,22 @@ -To precisely define what behavior the \openshmem interface guarantees for both users and implementers, we formally define an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). +To precisely define what behavior the \openshmem interface guarantees for both users and implementers, this section formally defines an API-level memory model. An API-level model builds upon a programming language-level (PL-level) memory model, and it defines how API operations impact memory. This is useful for determining the ordering and visibility requirements for \openshmem memory accesses when implemented on any device, but it is of particular importance for devices like GPUs that have software managed caches requiring expensive bulk flush or invalidate actions to enforce memory ordering guarantees. Without a precisely specified memory model, API implementers and users may disagree on what behavior is guaranteed by the API, and this can lead to poor performance (too many bulk cache actions) or buggy code (too few bulk cache actions). -This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (for example, CUDA, hip). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. +This section describes how \openshmem extends the C11 memory consistency model. C11 is chosen as an example because of its generality and ubiquity in the CPU domain, and because it forms the basis for common GPU programming languages (for example, CUDA, HIP). Section~\ref{subsubsec:api_pl_baseline} summarizes the baseline C11 memory model, Sections~\ref{subsubsec:api_model_overview}-\ref{subsubsec:api_integrate} describe how this programming language-level model is extended to provide API-level semantics for \openshmem, and Section~\ref{subsubsec:api_scoped} describes how the guarantees of the API-level model can be adapted for GPU memory models. \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:api_pl_baseline} A memory consistency model (MCM) defines what values loads are allowed to return in a multi-threaded program. One common way to formally specify a MCM is via axioms over relations between events in a dynamic execution. Axiomatic formalization begins by considering all possible “candidate” executions of a parallel program. Loads may return the value of a conflicting store anywhere else in the execution, or the initial value at that memory location (i.e., allowing all possible reorderings). The model then defines relations between accesses in the candidate executions, and using axioms over these relations, defines which executions are legal (hence restricting which reorderings are allowed by the compiler or hardware). -The C11 memory model is defined axiomatically in the C11 specification \S5.1.2.4\footnote{Formally, the \Cstd[11] specification is ISO/IEC 9899:2011(E).}. Although many prior works aim to mathematize or improve this formalism, here we simply try to provide sufficient understanding to build upon it with the \openshmem API model. +The C11 memory model is defined axiomatically in the C11 specification \S5.1.2.4. Although many prior works aim to mathematize or improve this formalism, the intent here is simply to provide sufficient understanding to build upon it with the \openshmem API model. -Under the PL-level C11 memory model, events consist of reads (\textit{R}), writes (\textit{W}), atomic read-modify-writes (\textit{RMW}, which perform both a read and write to memory), and fences (\textit{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed\footnote{We leave out consume and sequential consistent memory orders here for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model}. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: +Under the PL-level C11 memory model, events consist of reads (\EVENT{R}), writes (\EVENT{W}), atomic read-modify-writes (\EVENT{RMW}, which perform both a read and write to memory), and fences (\EVENT{F}). Events that access memory may be atomic or non-atomic. Atomic events are associated with a memory order which can be acquire, release, acquire-release, or relaxed. These memory orders determine how atomic accesses can legally synchronize (release and acquire are used with the source and destination of a synchronizing access pair, respectively). The following relations are defined over the events in a candidate execution, and will be used to derive API-level relations in Section~\ref{subsubsec:api_relations}: \begin{itemize} - \item \textbf{Sequenced before (\textit{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) - \item \textbf{Modification order (\textit{mo}):} A total ordering between all modifying accesses (writes or read-modify-writes) to the same memory location. - \item \textbf{Reads-from (\textit{rf}):} Relates a write \textit{W} and any read \textit{R} to the same memory location such that \textit{R} returns the value written by \textit{W}. - \item \textbf{Synchronizes-with (\textit{sw}):} Same as \textit{rf}, but also requires that \textit{W} and \textit{R} are atomic, \textit{W} has release or acquire-release memory order, and \textit{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). - \item \textbf{Happens-before (\textit{hb}):} Transitive closure of \textit{sw} and \textit{sb}. + \item \textbf{Sequenced before (\REL{sb}):} Relates memory events from the same thread based on the order they are evaluated (also called program order in some contexts) + \item \textbf{Modification order (\REL{mo}):} A total ordering between all modifying accesses (writes or read-modify-writes) to the same memory location. + \item \textbf{Reads-from (\REL{rf}):} Relates a write \EVENT{W} and any read \EVENT{R} to the same memory location such that \EVENT{R} returns the value written by \EVENT{W}. + \item \textbf{Synchronizes-with (\REL{sw}):} Same as \REL{rf}, but also requires that \EVENT{W} and \EVENT{R} are atomic, \EVENT{W} has release or acquire-release memory order, and \EVENT{R} has acquire or acquire-release memory order (this relation is also called synchronization order in some contexts). + \item \textbf{Happens-before (\REL{hb}):} Transitive closure of \relSW{} and \relSB{}. \end{itemize} Furthermore, a \textbf{data race} is defined as a relation between two accesses $A$ and $B$ where all of the following are true: @@ -24,56 +24,59 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:a \item $A$ and $B$ access the same memory location. \item At least one access is a modifying access (write or read-modify-write). \item At least one access is not atomic. - \item $A$ and $B$ are not related by \textit{hb}. + \item $A$ and $B$ are not related by \relHB{}. \end{itemize} -Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms\footnote{Instead of the verbose prose axioms defined in the C11 specification, here we use the more concise (and equivalent) representation of the C11 axioms outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).}: +Although the axiomatic model places no requirements on when each event must update or access physical memory, it prohibits executions that don't adhere to a set of constraints, and by extension it disallows any architecture or compilation scheme that can result in such an execution. Specifically, C11 constrains candidate execution with the following axioms: \begin{itemize} - \item $acyclic(hb)$: \textit{hb} cannot have any cycles. - \item $irreflexive(rf;hb)$: \textit{hb} must be consistent with the \textit{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \textit{hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). - \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \textit{mo}) written before the write of the read-modify-write access. + \item $acyclic(hb)$: \relHB{} cannot have any cycles. + \item $irreflexive(rf;hb)$: \relHB{} must be consistent with the \relRF{} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;hb)$: \relHB{} must be consistent with \relMO{} between writes, as well as any reads that read from writes in \relMO{} (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \relMO{}) written before the write of the read-modify-write access (often called RMW atomicity). \end{itemize} - Finally, the C11 specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. +\apinotes{Consume and sequential consistent memory orders are omitted from the above formalism for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model. +Also, instead of the verbose prose axioms defined in the C11 specification, the C11 axioms are expressed above using the more concise (and equivalent) representation outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).} + + \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} -The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, put, barrier, etc. that can execute asynchronously and involve multiple PEs. +The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, \FUNC{put}, barrier, etc. that can execute asynchronously and involve multiple \ac{PE}s. -To describe the desired \openshmem behavior, we make the following additions to the existing PL-level model: +To describe the desired \openshmem behavior, the API-level memory model makes the following additions to the existing PL-level model: \begin{enumerate} - \item\label{mcmsteps:one} \textbf{PEs and operation events}: Introduce a new event type that corresponds to an API operation and a new memory location property that specifies the target PE. - \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \textit{sb}). + \item\label{mcmsteps:one} \textbf{\ac{PE}s and operation events}: Introduce a new event type that corresponds to an API operation and a new memory location property that specifies the target \ac{PE}. + \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \relSB{}). \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. \item\label{mcmsteps:five} \textbf{Updated relations, axioms}: Update PL-level axioms to incorporate new events and relations. \end{enumerate} -Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple PE memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across PEs even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. +Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple \ac{PE} memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across \ac{PE}s even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. -After defining how \openshmem extends the C11 memory model, we describe how the addition of scoped synchronization in GPU memory models impacts the API-level model. +After defining how \openshmem extends the C11 memory model, Section~\ref{subsubsec:api_scoped} describes how the addition of scoped synchronization in GPU memory models impacts the API-level model. \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} -When defining behavior at the \openshmem API level, we must first consider two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple PEs, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the PE-local memory address and the PE ID. Normal memory accesses can only access memory locations on the local PE, which is constant for the life of a thread. \openshmem operations may trigger accesses to the local PE’s memory and/or the memory of remote PEs. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and PE ID attributes match (impacting the definitions of of \textit{mo} and \textit{rf}). +When defining behavior at the \openshmem API level, there are two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple \ac{PE}s, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the \ac{PE}-local memory address and the \ac{PE} ID. Normal memory accesses can access memory locations on the local \ac{PE}, or on a remote \ac{PE} if \FUNC{shmem\_ptr} is used. \openshmem operations may trigger accesses to the local \ac{PE}’s memory and/or the memory of remote \ac{PE}s. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and \ac{PE} ID attributes match (impacting the definitions of of \relMO{} and \relRF{}). -Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \textit{sb} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \textit{mo}, \textit{rf}, and \textit{sw}. +Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \relSB{} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \relMO{}, \relRF{}, and \relSW{}. \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} -For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other (non-observable) accesses. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \textit{sb} relationship between observable accesses of an API operation event and any other event issued by the same thread, except as defined later in Section~\ref{subsubsec:api_relations}. -Note, however, that the operation event that issues these observable accesses \textit{does} form an \textit{sb} relationship with other events issued by the initiating thread. +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other (non-observable) accesses. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \relSB{} relationship between observable accesses of an API operation event and any other event issued by the same thread, except as defined later in Section~\ref{subsubsec:api_relations}. +Note, however, that the operation event that issues these observable accesses \textit{does} form an \relSB{} relationship with other events issued by the initiating thread. In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the PE it targets ($my\_pe$ refers to the PE of the initiating thread). In many cases, we also show control and compute pseudocode to define which memory locations are accessed, which values are stored, and which values must eventually be loaded\footnote{We use pseudocode rather than C11 here because it allows us to more clearly specify where memory accesses are required.}. However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \textit{sb} ordering between observable accesses. Within an operation, \textit{sb} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the \ac{PE} it targets ($my\_pe$ refers to the \ac{PE} of the initiating thread). In many cases, control and compute pseudocode are included to define which memory locations are accessed, which values are stored, and which values must eventually be loaded (pseudocode rather than C11 is used here because it allows clearer specification of where memory accesses are required.) However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \relSB{} ordering between observable accesses. Within an operation, \relSB{} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in put\_signal[\_nbi] where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in \FUNC{put\_signal[\_nbi]} where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). @@ -81,112 +84,111 @@ \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable \subsubsection{Fused Collectives}\label{subsubsec:api_fused} -\openshmem defines multiple collective operations that synchronize across multiple PEs in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). -There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other PE in the team. +\openshmem defines multiple collective operations that synchronize across multiple \ac{PE}s in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). +There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other \ac{PE} in the team. The API-level memory model fuses these matching calls into a single event in the candidate execution. -This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of PEs, allowing us to establish ordering relations across these PEs (defined in Section~\ref{subsubsec:api_relations}). -This does not mean the observable accesses must occur atomically, and it does not constrain how the operation is implemented (i.e., any thread or PE in the team may perform each observable access). +This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of \ac{PE}s, allowing us to establish ordering relations across these \ac{PE}s (defined in Section~\ref{subsubsec:api_relations}). +This does not mean the observable accesses must occur atomically, and it does not constrain how the operation is implemented (i.e., any thread or \ac{PE} in the team may perform each observable access). \subsubsection{API-Level Relations}\label{subsubsec:api_relations} -Section~\ref{subsubsec:api_observable} details how operations trigger memory accesses, but it does not define how the ordering of these accesses are constrained. For this, we define a set of API-level relations between normal memory accesses, API operations, and observable accesses issued by these operations. These new relations will be used to extend the relations and axioms that constrain program behavior in Section~\ref{subsubsec:api_integrate}. +Section~\ref{subsubsec:api_observable} details how operations trigger memory accesses, but it does not define how the ordering of these accesses are constrained in an execution. This section defines a set of API-level relations between normal memory accesses, API operations, and observable accesses issued by these operations. These new relations are used to extend the relations and axioms that constrain program behavior in Section~\ref{subsubsec:api_integrate}. -\subsubsubsection{Implicit Local Visibility (ilv)} -Implicit local visibility relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating PE. This ensures, for example, that a put operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: +\textbf{Implicit Local Visibility (ilv)}: +\relILV{} relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating \ac{PE}. This ensures, for example, that a \FUNC{put} operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: \begin{itemize} - \item $A$ is \textit{hb} ordered before the operation event that issues $B$. - \item $A$ and $B$ target the same PE (the PE of the caller thread). + \item $A$ is \relHB{} ordered before the operation event that issues $B$. + \item $A$ and $B$ target the same \ac{PE} (the \ac{PE} of the caller thread). \end{itemize} -\subsubsubsection{Local Completion Order (lco)} -Local completion order relates local observable accesses from a blocking \openshmem operation and subsequent \textit{hb} ordered accesses and operations from the initiating PE. This ensures, for example, that a blocking put completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: +\textbf{Local Completion Order (lco)}: +\relLCO{} relates local observable accesses from a blocking \openshmem operation and subsequent \relHB{} ordered accesses and operations from the initiating \ac{PE}. This ensures, for example, that a blocking \FUNC{put} completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: \begin{itemize} \item $A$ is an observable access issued from a blocking operation event (i.e., an operation that does not have nbi in its name). - \item $A$ targets the PE of the operation event that issues it. - \item The operation event that issues $A$ is \textit{hb} ordered before $B$, or is \textit{hb} ordered before an operation event that issues $B$. + \item $A$ targets the \ac{PE} of the operation event that issues it. + \item The operation event that issues $A$ is \relHB{} ordered before $B$, or is \relHB{} ordered before an operation event that issues $B$. \end{itemize} -\subsubsubsection{Remote Delivery Order (rdo)} - -Remote delivery order relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same PE from operations \textit{hb} ordered before and after the fence, and between \textit{hb} ordered accesses before a fence and observable accesses from operations \textit{hb} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: +\textbf{Remote Delivery Order (rdo)}: +\relRDO{} relates accesses separated by an \openshmem fence operation. However, this relation does not establish full connectivity; it only forms a connection between observable accesses to the same \ac{PE} from operations \relHB{} ordered before and after the fence, and between \relHB{} ordered accesses before a fence and observable accesses from operations \relHB{} ordered after a fence. Specifically, $(A, B) \in rdo$ only if there is an \openshmem fence operation $Fop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and the context associated with $Fop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. - \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \textit{hb} ordered before $Fop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $Fop$, and $A$ and $B$ target the same PE. + \item $A$ is a memory access that is \relHB{} ordered before $Fop$, $B$ is an observable access issued by an operation that is \relHB{} ordered after $Fop$, and the context associated with $Fop$ does NOT have the \CONST{SHMEM\_CTX\_NOSTORE} option enabled. + \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \relHB{} ordered before $Fop$, $B$ is an observable access issued by an operation that is \relHB{} ordered after $Fop$, and $A$ and $B$ target the same \ac{PE}. \end{itemize} -\textbf{fence-ordered operations}: blocking or nonblocking put, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. - -\subsubsubsection{Remote Completion Order (rco)} +In the above, "fence-ordered operations" are defined as blocking or nonblocking \FUNC{put}, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \FUNC{put\_signal} operations. -Remote completion order establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \textit{hb} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \textit{hb} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rco$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: +\textbf{Remote Completion Order (rco)}: +\relRCO{} establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \relHB{} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \relHB{} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rco$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: \begin{itemize} - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an access that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. - \item $A$ is a memory access that is \textit{hb} ordered before $QBop$, $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$, and the context associated with $QBop$ does NOT have the $SHMEM\_CTX\_NOSTORE$ option enabled. - \item $A$ is an observable access issued by an operation that is \textit{hb} ordered before $QBop$ and $B$ is an access that is \textit{hb} ordered after $QBop$. - \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \textit{hb} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \textit{hb} ordered after $QBop$. + \item $A$ is a memory access that is \relHB{} ordered before $QBop$, $B$ is an access that is \relHB{} ordered after $QBop$, and the context associated with $QBop$ does NOT have the \CONST{SHMEM\_CTX\_NOSTORE} option enabled. + \item $A$ is a memory access that is \relHB{} ordered before $QBop$, $B$ is an observable access issued by an operation that is \relHB{} ordered after $QBop$, and the context associated with $QBop$ does NOT have the \CONST{SHMEM\_CTX\_NOSTORE} option enabled. + \item $A$ is an observable access issued by an operation that is \relHB{} ordered before $QBop$ and $B$ is an access that is \relHB{} ordered after $QBop$. + \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \relHB{} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \relHB{} ordered after $QBop$. \end{itemize} -\textbf{qb-ordered operations}: blocking or nonblocking put, nonblocking get, blocking or nonblocking atomic memory operation, blocking or nonblocking put\_signal. - -\subsubsubsection{API Synchronizes-With (asw)}\label{subsubsubsec:api_relations_asw} +In the above, "qb-ordered operations" are defined as blocking or nonblocking \FUNC{put}, nonblocking get, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \FUNC{put\_signal} operations. -API synchronizes-with is the API-level analog of the PL-level \textit{sw} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \textit{rf} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: +\textbf{API Synchronizes-With (asw)}: +\relASW{} is the API-level analog of the PL-level \relSW{} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \relRF{} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: \begin{itemize} \item $A$ is an observable read or read-modify-write access issued by a synchronizing \openshmem operation. \item $B$ is an observable write or read-modify-write access issued by a synchronizing \openshmem operation. \end{itemize} -In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: atomic memory operations, signaling operations, point-to-point synchronization routines, or distributed locking routines. +In the above, "synchronizing \openshmem operations" are defined as any operation from the following categories: \ac{AMO}s, signaling operations, point-to-point synchronization routines, or distributed locking routines. -\subsubsection{Example API Execution}\label{subsubsubsec:api_relations_example} +\subsubsection{Example API Execution}\label{subsubsec:api_relations_example} \begin{figure}[h] -\includegraphics[width=0.75\textwidth]{figures/api_relations_v2.png} +\includegraphics[width=0.75\textwidth]{figures/api_relations} \centering \caption{Example \openshmem code execution depicting observable accesses, example API-level relations, and a fused barrier collective operation event.} \label{fig:api_execution} \end{figure} Figure~\ref{fig:api_execution} illustrates how the above concepts are applied in an example \openshmem code execution. \openshmem operations are denoted by rounded boxes, and their observable accesses are shown in dotted square boxes. -Matched barrier calls from each PE are fused into a single operation event. -Operations are ordered within the calling thread by \textit{sb}, but their observable accesses are constrained only by the added API relations, described below (in addition to \textit{sb} orderings within the operation): +Matched barrier calls from each \ac{PE} are fused into a single operation event. +Operations are ordered within the calling thread by \relSB{}, but their observable accesses are constrained only by the added API relations, described below (in addition to \relSB{} orderings within the operation): \begin{itemize} -\item \textit{ilv} orders the store of $a$ in PE0 (and all prior \textit{hb}-ordered accesses) with the load of $a$ from the put operation. -\item \textit{lco} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) wait\_until operation. -\item \textit{rdo} orders the bulk store of $a$ from the put operation in PE0 with the store of $b$ from the fence-separated atomic\_set operation. -\item \textit{rco} orders all \textit{hb} ordered accesses and the accesses of \textit{hb} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. -\item \textit{asw} orders the store of $b$ from the atomic\_set operation in PE0 with the consuming (\textit{rf} related) load of $b$ from the wait\_until operation in PE1. +\item \relILV{} orders the store of $a$ in PE0 (and all prior \relHB{}-ordered accesses) with the load of $a$ from the \FUNC{put} operation. +\item \relLCO{} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) \FUNC{wait\_until} operation. +\item \relRDO{} orders the bulk store of $a$ from the \FUNC{put} operation in PE0 with the store of $b$ from the fence-separated \FUNC{atomic\_set} operation. +\item \relRCO{} orders all \relHB{} ordered accesses and the accesses of \relHB{} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. +\item \relASW{} orders the store of $b$ from the \FUNC{atomic\_set} operation in PE0 with the consuming (\relRF{} related) load of $b$ from the \FUNC{wait\_until} operation in PE1. \end{itemize} Note that~\ref{fig:api_execution} depicts only a representative sample of API relations, since an exhaustive depiction would be cluttered. \subsubsection{Updating the PL-Level Model to an API-Level Model}\label{subsubsec:api_integrate} We leverage the newly defined observable accesses and ordering relations to define the API-level memory model, which is based on the C11 model summarized in Section~\ref{subsubsec:api_pl_baseline}. -We first define a new relation, API happens-before (\textit{api\_hb}), to be the transitive closure of the C11 \textit{hb} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: +We first define a new relation, \relAHB{} (\REL{api\_hb}), to be the transitive closure of the C11 \relHB{} and all API-level ordering relations introduced in Section~\ref{subsubsec:api_relations}: \begin{itemize} \item $api\_hb := (hb \cup ilv \cup lco \cup rdo \cup rco \cup asw)^+$ \end{itemize} -In addition, an \textbf{API data race} is defined to relate any two accesses $A$ and $B$ where all of the following are true: +In addition, an \relADR{} is defined to relate any two accesses $A$ and $B$ where all of the following are true: \begin{itemize} - \item $A$ and $B$ access the same memory location on the same PE. + \item $A$ and $B$ access the same memory location on the same \ac{PE}. \item At least one access is a modifying access (write or read-modify-write). - \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsubsec:api_relations_asw}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. - \item $A$ and $B$ are not related by \textit{api\_hb}. + \item Both $A$ and $B$ are normal accesses and at least one of them is not atomic, or both $A$ and $B$ are observable memory accesses issued by \openshmem operations and at least one of them is issued by a non-synchronizing \openshmem operation (defined in Section~\ref{subsubsec:api_relations}), or one of the accesses is a normal memory access and the other is an observable memory access issued by an \openshmem operation. + \item $A$ and $B$ are not related by \relAHB{}. \end{itemize} -The following axioms are similarly updated to use \textit{api\_hb} rather than \textit{hb}, and to extend RMW atomicity to \openshmem atomic memory operations: +The following axioms are similarly updated to use \relAHB{} rather than \relHB{}, and to extend RMW atomicity to \openshmem \ac{AMO}s: \begin{itemize} - \item $acyclic(api\_hb)$: \textit{api\_hb} cannot have any cycles. - \item $irreflexive(rf;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{rf} relation. - \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \textit{api\_hb} must be consistent with the \textit{mo} order of writes, as well as any reads that read from writes in \textit{mo} order (often called coherence requirement). - \item $irreflexive(rf;(sb^{amo\_op})^? \cup mo;mo;(;sb^{amo\_op})^?;rf^{-1} \cup mo;rf;(sb^{amo\_op})^?$ where $sb^{amo\_op}$ relates $sb$ ordered observable accesses issued by the same \openshmem atomic memory operation: The read of an atomic read-modify-write access must always read the last value (in the modification order) written before the write of the read-modify-write access. Similarly, the observable read of an \openshmem atomic memory operation must always read the last value (in the modification order) written before the observable write of the atomic memory operation. + \item $acyclic(api\_hb)$: \relAHB{} cannot have any cycles. + \item $irreflexive(rf;api\_hb)$: \relAHB{} must be consistent with the \relRF{} relation. + \item $irreflexive((rf^{-1})^{?};mo;rf^?;api\_hb)$: \relAHB{} must be consistent with \relMO{} between writes, and it must also be consistent with the order of reads that read from writes in \relMO{} (often called coherence requirement). + \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \relMO{}) written before the write of the read-modify-write access (often called RMW atomicity). + \item $irreflexive(rf;sb^{amo\_op} \cup (mo;mo;(sb^{amo\_op})^{-1};rf^{-1}) \cup (mo;rf))$: The observable read of an \openshmem \ac{AMO} must always read the last value (in \relMO{}) written before the observable write of the \ac{AMO} (this is the \ac{AMO} analog of the PL-level RMW atomicity axiom). In this axiom, $sb^{amo\_op}$ is defined as the $sb$ relation between the observable read and write accesses issued by an \ac{AMO}. + \item $irreflexive(rf \cup (mo;mo;rf^{-1}) \cup (mo;rf))$: The read of an atomic read-modify-write access must always read the last value (in \relMO{}) written before the write of the read-modify-write access. \end{itemize} -Finally, similar to the C11 memory model, if an API data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If an API data race does not exist, then the program's behavior must conform to one of the axiom-satisfying candidate executions. +Finally, similar to the C11 memory model, if an \relADR{} exists in a candidate execution that satisfies the above axioms, then the behavior is undefined. If an \relADR{} does not exist, then the behavior must conform to one of the axiom-satisfying candidate executions. \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_scoped} @@ -194,5 +196,5 @@ \subsubsection{Compatibility with a Scoped Memory Model}\label{subsubsec:api_sco When synchronizing between two threads, the synchronizing accesses must use a scope that contains both synchronizing threads; if they do not, this represents a heterogeneous race in the scoped memory model, and program behavior is undefined. Properly scoped synchronization is critical for GPU applications, but it is not necessary to expose it at the level of the \openshmem API. Scopes are important for communication between threads that share a cache level, but the threads participating in inter-PE communication governed by \openshmem generally do not share a cache. -Of course, implementing \openshmem operations may require synchronization and communication between threads in a PE that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of the \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. -This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. \ No newline at end of file +Of course, implementing \openshmem operations may require synchronization and communication between threads in a \ac{PE} that can benefit from scopes, e.g., when offloading a non-blocking operation to a helper thread. However, this synchronization should be entirely performed within the \openshmem implementation transparently to the user of the \openshmem API. In this way, \openshmem users and implementers may freely and independently optimize their own code without agreeing on the relative locations of \openshmem helper threads. +This policy has the added benefit of retaining a single simple and portable API-level interface for \openshmem that may be applied on top of memory models that may or may not use scoped synchronization. diff --git a/figures/api_relations.pdf b/figures/api_relations.pdf new file mode 100644 index 0000000000000000000000000000000000000000..057477cc4e897f4cb07d64ddabd6438ba34838ff GIT binary patch literal 39660 zcmafZV{~O(w{2`xY}>XfwyjE1QN^}hv2EM7ZQIz%j-3i$oqO)R?|Z-AYO{^$vF2QD z^wEE;PN5_w!N9`E1xFF^_kn|BVJ2oKw)<%b$IlPPBw=CgZ0bnNBw=mjY$|4IY-eH$ z$0TQJYwrAun1`8-2To8B&dJ%))W`C1ZK3Wgq zn<7^n^l6d#w=AT+@i*{(Nf}bB+U3r}my5%8EnDocOj$McTZ5#EG&seN-Q-&D4^=q7 zC+!*{Ki{`=JYSmTlSm%GxzHzI)9;IE_3NKODq8h3{MTCmSI5n6@8=)ZJ)zIbOD?~Y z^WD$b8MZS0B&$Joq3)04OC~!%pL5Z1noaf)s?r}!Gf%Y!bw%-|hH%}*_p5noxNNuL z?Zqk%SlXlS=ebPa?~I=}gPV4rn~$5jpC<-AZ?#|E$LDUnZ-YX9&rg{`i2y4oNY@5JF~rQ!<-n&%n#f{dcB7V6Lj!!wyw+7&-jJVF;#n{ z?!?Iq1U9<`?~*3leoI!Y&e~1+^cx^xvSJ&Pc8$Eh;mJAyb+U19O8U}Z_*~FRuP&DS zxgbZx*2*OvfYJ(x?y81z!N|==-enZC_mKmUBBOt&#@i`}+e>ar@*KjlGqSBnXE;K* zAY1t^mX_Nor~jEKa}3Mw;9E7}UPa$Q@7^*h&FK@V^cB2O1Kxs|_nCX=)eM&Fx<$aO z?k~X+0e7-I)k=Tc+bLAv>P>!Tx3ATfQRmR+n|C# zmep3a6`Wl}{B@lw7XVlD?@VSv^53%`=o!1#&~(v=UDc^#0dNzYn~5?hn_+)Hu=W3s=#{)qE{cIz+Mf@a6VacrP;pU4I3kpdD{d;KQbA0Q7~ zg>dwE%sqa8KEvPF>bcM8ni->wX69u%YES@%;7Rg?_%(kp>x);Mf6X^vRXL)yZU6%;2c1y*;nG$kAX z(+u~7iEy1#Y2IO}#%UJD3W0h0qLCkQ`l*5_K-~o16XwanB`?W4<%GH;%an=mUvQ*e z@?zP^`Hdx8C1zG^@ks?+UuV@xStHa4rTVJV3lBAua)mNMnJG(vG+hl3IfTpd*!72c z_1jbaT6jeT~HnHm>e~^^GcCVMU(Sokt5@rV<-{l~M^z{s>w` z^3n2W5*RF=Wy(SFQ4364b)x(g%h)-JX&Kz_v`>nP8esPEAvUnYM2w&^w-o;UEfCZf7GH>r4zTuRB!Tk( z!Y;cbT-D9hvM&?NnDk<)ko%GRzau{km*%5HVx!kjIZfcQw_ibihNI{stb=>iSuXe= zKuWzKH~-t|;9ZS-A_AW1E&jy_@Jdf~!>=sNN&X)YGkBO8fc42{iM>L-q4w>GGW0-D zFE4)L(f2?wzbJ7TfF<=@IdlbWdX}VL+t2!jz;GF`Vj~DPb5ONy`S|vEDMW?aKR5Ys zH_QDG9vRll)3g4jk~r_b=tnON{rX=#HsV|6%xL~i5+hxd}{4sK%r|Pb{^w-1Rd+25TMJgPl+gdq|{j|!RyqzD1qMu!zWxl6n*b|1-Rh_}~ zN!4^Mx+e8#%`vaB49 zWF$I40&Ti_M6TF1kVUEjEj_XLTRH^RImhFSDq12-LQ^;hCpigV^K4=aa{RkhxJ693 zhq)|$cfd56sc=bHJsZkhk%~F3sZI~SRfP*_M^la^<+3jVK_jusq+Ivv&y6u@b~)qZ z)Jb+1OX@Y!cGM%c1)E1LmCCVe*=dWFoO4{hFbe!EI6zCXcYT+2O6CMEUi^S9r(;gJ z@tC?+g3`RpxgsT}{=|*W&J;xpXWLXHe8SMUO?aSg<>7&f3Y`aG{h3PW>fMOg7HXdd z3ria1$Zi%doA3~aa&C2X5)s;52%st?Vww+4F{vY+TzHKVUbA@Y&DVEr6lns?mnZH#~-OPI!qk`A+pz3sbwcM*?cdSr6) zhKeBP_Tc>(d68%#`BhU2zQMF~8MEbAn}UH~yMo=rrBD9n^&>P9CJ^2cExk9TI5zH| zG8>mJ>AuzQARj&DtqQ(_zVJ6v$j*0`#-_ObC*eOzk;v};oA6(htX2d>QUw1NQ8PlL zQc6IE*-7UA#Y0BAouxy~**^^DQe$D(Q=a)R1UjAM+3w~4QUdQ^HW2_{HQU=j(98UT zLk$8|Gmh1dnjU(sQ$kfkq@xqv1j1>CmA4fa7uD=a4J19!R2<0{!4#r-+7j{+e`jQ zDbxRM%l@zXGt6*U=iytr{`;LA{>%Me68tAsxKIoXYx2tV1ULpo*6JY$E*)jkWch#} zLnYdkV0&w5FLJ3W92I(mxg8b0BO%dM;$37iLHQlfR{C^J40J597Qen_9V7Ehy75fr zT4RJ@sDBQ2M#q&_R&!{0--|iyGQ7m*Ci@t%8JRHHMv9c_%N~R~LVu1J_`n#^k(N&B zybhxz#7ywoi)nSezd%`6+b~|xYc-^1P-3Z|n^$k7q)b?QP%BuT7Y!k*?^j{TGq?%M zqqv;$YB~5|>i0biP4T;!lcOoC1CfC1=(>IC1U#9iQAK>@8Ugew_G*Soz~)9}WPQ0k zM%f9yQuTMw%tVaQD#%aP1^l$Iq-m|{&$3Q13b%{BsDzUxTC@}?G#my*g((?+RSXSA zCt$jy!H|s%)GYb&iZapwr|8w|T7nq2<{;u(1g@!!{!k-Ii;aYQ2yMhi-!xCyfsSo^ z6>i%OopGX8<(oOaZ%#=OEx`+TjA{>lk6j9mA$-^iWeHQ%`ufSuWC1; z++Le)hT7Y*G#24T54gm-FKaptTRfhKdu_%ZDbGyrx}mbyYK)+^wH1n{3r%0wPHaz< zC3{*lD-udSGbz@9%%V^M8&9cR zM8_%DjqPDK=e7nv7ccO(6eJ~Upx0-l2gvNWR-!yv%>Q)e&zkc)x1LVLAT13a)T|5x z7}p^QvRf;IJ3UtMk)dmh`RwTlbT>{(Mw(YdceZ}=!*$pUyPs*C#xGZ6 z)Vo0fh&XyZQfePGVf;dF{?mNT5)AZyRiTQ1-4Mh%{^?_z`f zXOpN`sqEznEfPiCl~VO6XMFE^Pwbc_5%h{BQTak35ftb{367t1kTM1Wds$8r4~nF- zVjLB39*$6VjfS(Ax6csD#iARhOByu2^Bj&~&m2i8TA8MO!w?Vp>4%wlRW+`He8M?e)m4jm;4HOp>^mZ{4M2$ zA@+>&4)gDr*wxgP;(gjD`9A}wklVQ@EFaQ@i$wQnR}LJ%jQtWuAE-R!p3|;e=aQ9K zBF{!2l5Abuqssq2wu;}NuJnz~EfT4c(XhMIusCbl0QGzr(W-ZRZZ z9m^y8kSt6+S_!p3Zajnf;jHxga6V{f%y_23cu2r}eV+V;o)0*bWg1rMHPZmDjmQ%@ z+G(h9M{jO#LBPu( zl_tWCd~EJ*O}fa_N_x4_rz?2B_HA;owJaE1)(VW9oI-s{ko6>lvgTU{ zumnQakG&jNr!FXv=Q!`#J$nd-c#XQVJQw*>@NTIYFU`ST#t-lH%RsJ4ZT~kUJ zQYhio<+}acc7=I>^SzE-5o7C!zu10J53@2HI*OZ;(DuNnQF;51WO5Q zJAQMiTF%-dA{5lB6vgv@<+scuV* zwS4DclyXln3#dw|WNLe}wZ8Ntt7;(F1*xpKzcoe%47sy~UNff?&G$!bEtV6H8_3w| z%sw4vFiageUPn%~9si(~pbn;5=&crbpL+%Hd^wKzdFP@L**rrM32P+H67Am$F&KUU zS*om8o@LO`y;DQt4mXcQ+}JQomXUC*(;`xmmVqEiQt69Iq7S53VgAkjUa7 zngaBi<%1XlhMV1kc^b-00r-p!6`L4UdK&}P!$Fyq!VG-74ldkR?7&+H>%Z4^4pW0WGzPA?6BStX%%8$Fkat zY|$2Cwz?qb_AT}`N^EPO+sZeWkHH=uRH1TnKH+sMJ#-Ua#dT6@i?cJG9%d<`x9bme zt3&Q|cVGL3D;DQukB}SvC;pY+dw5}~<4&6*AHDM`pbJTGA>?3YDFh(6I*i}lNEiI( zu!{HmTw}z-+Qyjk?Ev>YV8n$_jGPM$S>Kk9@L;P#gx>l7KJN?)*zYpDXF$i$Mfymm z()*uAh*!l=Cptjz)FA+Q7ETm!mV^tWPTkU)v%G!?gU(d+0OT<&x+VuTv{lE!JK@GF8$v6`7`baF zGN^5nOTy0a>neW>w77z`Mr<2f)t{Q@G}8;}?FQ&WI@480my=b`^w3(#D_0TYR(Qu?KN^C29pk-wX(H8vZ^V|UaQcE+Fd zc%W<861>VBt9H~S4H!V%=)_|*gS|d!7a8-2oW+oQn)KBj6jOd1+gTZw{bjdgooqT> z{oq97roI)kHVQYiqt~^iX41WTN=b3#g<|XFxRd65o>Z_>{`FawLjQVI?Atd-ZqVR6 zV}CZ14o)P<&Jdvq?s~S_LzhP{b*T$T(EOcnGeJ>@S#Z=9H}QH)cG^^$h5xJ0VHo05(Mo zbsVjk7w3<8kGx+8j~rZx!^$DR5SR23Rf+i#6z1^ZIUE_e2z%B!QUq z7KPAgt2)TMHM=bQ?F8<`PENP(fKUvw!pyudB`%e>->D({1~f!n1&?$(z5i+aB7S{% zP(ZvMqk>Ioz2m|5wgN#bpt$4mpwU<85MQ5PI}R6v;C4S8`RHhgw)?v*zap}Ed+$78 zniPgX6*OR=xjoi@EhC}oTu^*iB}drjA)M5pca{_x>}mGqjk);WC#6?fq;@m^iic>c z4(rR11zTaqEIq91({KF=Zpu=n;TK{~P`$|agqCH)Bfs|qJEIz0dkX=eSzWa$?WpKt5IuVk@y64=N4RY7IcWgv`M40dL zxqtuTFjM{8*bWE6hrr65g!<_QohGgCW-)72ZNztUQQq?9M3TAZ?;B!0EGLOp{A@~1 zdp0f7TeDMcO-fDqw=HmD`4|8vg9?-G9gD1lZaO20*(rUyb^qJGbuK+V{k~>_qv)zL z(wLo6?PXMG%D?^#N0ICA+US(qwt3VkYU`#bDm(@<+oq`Id6jdY!r^KDDGee{oq}DM zXH6n5oh_6QD%V&HEmfi??JaO9g;lCI`YV;7(-;7vNX_iuu}tV{o*zP|n!k~S(G{UlrFzz1fpaSV_FvrUNr@mVNccmt%>_;VBXJ5Nq#uIVw4*Ery;^A6{?NvC+f1F z7s1$fGH7#mQ@wMr5uJf1?LrQn37r|&*?aMHi844MqdhD%OPYISEXfB&-axXt&yT?z zG|`2~UNn)It{jZ1iqCC|hpFCdO5w=-bx3w5AU*G9+w`;(+-!RG^O)BGXrB-fM>kgukeH*YJYavgY=Csi@+IWc}gkS1vC-2i zX66?ud)24D0+({4glK}yi@RvIirq}bm|$DLtfiypTM=UErJ;1-EfKYB7O=BSfBEFz zeBannj4PSz>+($0uA$L2?!_ZLA>3ozH`r(055mwcG|os;I@oJvetW+B%V@w;9MR;ERTfdd~= zG_gF361Pe;UxKS%{U8Z?17zPu$+BCkv24i%_=!hVGR4_M1rx|5A-W`3(NaU3C854d zSB{|UvcTqpvAjbm{qTZ+Iv-J2n78W$3~3JWJkFZ<9%)y@(I4h&Q@X4|RaQf+eoC7e zv2q-Y1*9#m*CE%?*#Ei0i|Vi}9WvjKM`5&2jx){S=Ud6DN+54gAZ{br_(&RrP_)sR zS6tx6Vy3iiHl?H*(&ua~+iueE6d#n)(z3+XTH$i)Z=z$~Js_7j2XA48I&*=FI5i20 zlk|i(ngO@6#l&uleougsFKAMRoL^vL;cr}%fJ ze5>-kR55S?CDQ>XHw}$@Z=sojHD90ZX<#l)t*YNidLi;Mhjgo1B+snP_X5H6cUl|| zmjde?`43i98VRyxC`6*7w^H}WradJKiv+mB%N!F0@hX9aA{n&UZ1{{aPH~q4Tdtaq zV>}vHvz)3Nol65t(7rtxGkM5+*V3c28CFUph)KU#)`^bWxkLD$Wk|?@3m$d z9k!W0h8k!JvPzM-8K>VtwVeUGQqpD^54aZiv3Y5x!o%Gi1tBDUD-)O&p#z&U`^mP1CAo z(S);{j;BI>*|g_5%iWeIkFd?(Mu|hq#6@0H9+hu3XXvP>x)OT_U3g?#bt1j}8^dqv zU5U|8=>r!iy6u$J2}^jg<#Hpwe;c^U8@0A$0};v~1S3SGg~xw_B63qDsnrh&T@PjM zgKo(82=r{|5?Ykc=a-|OuGIWZDJyK}U~X|2!$&7Vkc}LFu_p1`qd!8#ng0p1;Iv*g zwxJ~nbw?g1#MJJ~N|1t^0yJwT;>7<3tg5YM)tuEYV3G~jUr`{!C`NqLfsPwfCGjtX zRE$wtTkGLWHV_WXg%$@}-z>(xGUtmIto!HGyvEkwgbu=ndF7rJOeN?xRuw1^VYF}y z)(dQmm==sQ0?GY9se{RVfPs{o&8LLzP)sH#hH$T<;H_Y`!lnpNP+Q?`;;7{`V3BPe z()&Y}l@Sf#M8&@*rJIjM%FClicb#ynSw=1P(8G^?1@N1dpHZ*m6t$#ck^RZ=HiG#D zX=akk44hK-+Th)6COo-IsdTG`t@l_gWQ5GVK}pjBJ%XlS!b+t0vY^g@xZjcS+X?$X zmd>Dk@)KbCpG&mFltwT)zKvAPQ~sijD%J^d*0h)?ug`h2-!9D>Quwfj&(De&8T=&; z(Hhi;BgZ%^xmZ*c+Ucn9#?hI+N&lh{+6mr?FX(3!6SVSp(bC|NMz`h7De3Gr;T>pa zD7n+W+MY5EnbNmyU6YW~tMhXVs2We**>-o^Em>!pk$NRdCkHtvNcfq>w2HpB_Z%dX zzsnCMrvLhDLh}TcLRK>K;G5%!vYEyftb*?OWE1rYBNx)%w@XWAvb}Fu9GnDR;I86x z#;M|10ATK=M6CEG)8-M`r7O~Nr8O7owRvZz{q88sHsOtXU0=lE8Q5fuVi(IZw_@tR zZN0G8DBkW6(03D~@1v70LNc&Sg^*3}nZKni!t)zk$Hn|a@=nl3JtDI2d?>$MN3#dF zOQl}_iijO))%2oup%Dvx)bcHvisqQA=9m@AVTe^csaS z)I;N+`H+yHER7P6ENEI#j`v`1ap zFqF9Av?njcmy@_1OtlhR?a4Bp8JdmTmosanVphhAtY0TtZcayG;dTgro>D(Qqc`PV z#0<@2f$Xnw2q?)c57{=+?lfDk6Tp#NPpM6_K=8tBij%XyH>Erb#o<_J+E zsQar{0MrY{$s2on(tEvLg*RTiee?62FXyQ+W2nBj323d!Jy0TGdp&HI@c1tIAufqfd>c zzy9i8xw*=?n_LZkOfH1?q=b`i_5BJk``iA}eCWtX``|{9AVp=KbY2)hyfn5@tF8|#>+9LhUGEIop07R(#s6aW&Uy0ZH%aKd)K51S*z0>= z4^D6-#%g8JmFTYA^UtwdpJKb(7b5x~IL@b6p&z&P{HzSr9Vj+4V0fP+;q&|1hiT8yd<;RK=)tU1ZjE? z-H8{L0tE#TgaC0EOsRjg4w}~rwwJ;HZsk?_`|iAb?qbyiuL%@Hy+bO03;hb3(Ep3_8mg~pEcFv?1+6fKsvb9{ zZlGL8Uo!%$x(u2~%09}^sRmqS_m{tJKBA|gz{4Q%7UbxFQL=1KghA1P=6B==5h3dUqfq3pr&>WPA@kx{ zEC7~ZkiCY_C;L|(z|Vs*{%!o@s2h)ncIT#R`r{r;kjRPXew5ubqqNEPveo4bxL7MV zJMu8pMW~DRR$Kptn~L^q=56tw2%C$YFdjC?TnkeE?g%-qT9?fa>99JlxGgsr-53A0 ztYV<@lH7Mow=doNnK<@bi(z|Id-x~$+mSDPh~IhJmek9ta!zF%%G!9qtN!>#5i0$? z^9$OVlfIA0J^ICo#FAf>E0x$qQzm#Bd9vZL#GY>q1G^CHESNuLM;gGY38mU;b^ay}>1@6OYOTY8?qLvD3wm%4zHkm1+#b5h|3@3Rgkdop zQMd>Nb*Kn+V2Bb-aUG0=AwyLVU!G!dLLNvF(-EcqaV!2xKU5o9%XcP(FQ%qPW{@Ce zsKGg8JVUam3Y0d?JK;c7YL(+SwHO~_;qS*JK@jqPy@!az8KPJRFc9mH8j0Cd2cftu zRa@?gr~@g>f+Vz%BJ&V43~}foVPDl>f5aK8CC8t6bkhDYi!%gS@jH9z>z(OEdcNZ$ zh8C>pVA1lD zng4~tdUil{egl*n0{LCNL*AZan;%^DMgU|=gzKHmI_BqYK&}r9 z-8;u80$~7^5gR|4xgi@q6!GSm8)rYM&;!$cn5 zN`Ifc9F?JvH`Z0atrfAalrpnQ67 zGpl=hPFp|U4?-eS8C7=0-P*qjgoabj*uQ4| zqL}1YwZ8HxMqdYsjrE}~ExjWYLP~x1X^g{nJx9FE*3$#nHfQk#oDQN~Z3(?FsO4P8G5i@D#f0s$JvdJ!p<;I?{r^Dy!ewXKF0@ zDf0&ACeyhhzsbRFjo5so<)!0T)%qv0y0Q~&h0(8mS?#AaAil+kqRGdW&$66RMk zu=|zp!&RVkwew-}0sK|^1JXyU_7T{6&UAJLve$;=ku~r^wW-0KJ*~QNb7v;L*`U+E z9pMA%^`_H_K5YxR(W&8eR$W`(FiK2<#D=B<7H!mH<7lK zTVErl!A`)%_*#iZi_Q-P!%-X=S=t1y4x^9Rvr@L~Xb|ZV2PK%}@rQdfmuX1-FkY4h z>u~3+61j}-_K}gbk~9|8F*7p{*eW7l!&O6Hs-(w|?=Dc+midr#UZcxSDy4do53i#< ziaKw6eB4>dpWB%M^9bZrkmJw8&XVb=79Q^fxY0m59`cH}aIn)Vj5K@vS$4>%MGS|t z!?IZzqRKNIrAQ6jrKjL+akixpp-O}^QXF+_Sc=+@KQ&ID2Z!tQ@v)sYsf}-%Tz>@oG5@_e$qfR zsGwSteX1lqUGBD9RZvTn*qrvTnjpRf!fHVwVD|VzP3pH%34lfV5`j%HwtYjuWRgBE zQ`1?FI;*>Uc4e*h<>`s`&m5IhXrogDX z<~q-wr)n~|j1yHb+S_b_5}k3>Fp1Vir8x4|qKPB4z)zQ}g!<5`2uvH6n@n|gCI1Dt zNMd*uFS~FgQM^V%*Vs;~U&$|s*C%3S9=1WTAunB}N!-M_A?v`w zSr=bC60RgF1~D?Zvf+TawI4|(TigWLR)Wh$j9Sy%&7aE}36JS{jNnLGDV559RFTvv zIXG9VYh7HauSw*|kXO57Dlu(X&?crU>ZvaD2&rywt}G;_m4fJBb9frbTwCK^Ahkx{ zsX4viEe|T1R|sj0gtJS1hcjs)l^@K23k8B(q-Uqcs}Q4wG|J8gVC>WwOCU}amg%c-cPfxPz52ub zY>_h(WO7+sS?9IEqZnRY=yx_gQLe`42(`RF)ob3K>zL7kYyeB2w$a3!N{iS{ckOma z9=3e5WV1)>upnfRKee8wqr`U7O(LNl+9?8YGBOAuk&pJpoaOYZl}DxUBhR5yc=e`F z;cBSMuH$%=U(;}2DqEVKEETE#tEo+^(kbY4Z}-9MwkozOlXH z5J6myyzG)QHL_hzU`_#&3rB%4;a^2k4|Ukob%;J+{$OG519C{4q6%v1b%`Dbw>Ulc zZABp-MlLH1MT>7(+)xl>L?&P>tzmv;YUNs@Ja&3klb1^&m^VE(NSnkpWKCIEmw|dn z!wWT>QX#y)x>EJ0jSIq+sl6RUjgftIwSq7k=Ge2`@8}jq6}a!+7QaATk}ZOcs&hfH*(o3qc$Rfj-1e z*mo|jX@Bx)C@$>~zlehX6;!gI(=eC)JrR{ns-ZeA#m3F~^4zD!OEGM5y}tCVeaA;1 zDWT2n3i=f&3VtJ5!S8yd#JiD0N)2K8q6I%aemtVU)?9Rg8x|I0fE zmJQ=`cCmtAb$?QDz9N0;8=^NmPGmjljRlH_S#2sb8rts_4cSQUkFo9I0wW}!jK@Hl*tT8(eHeOtwJ#` z2Ra8y+#Nz!TKM6{Wx-kR-?Ny0APiaL1`pQnWqc5HWX06ER&{(tdC&)?pm24fsv-ezSC^Gr%1@ipn%jH+9>KX`2xzc4{n1(kiq`*tqEG}b00)!>RGUFp4-A34NWQCnWs!A zH-i0I&G4`eJ4!@&Tv5H%1iM;c|1(Ffr4;S;UZp$j&5PG3Th6bS7k5Ssd?iRVkaq#^ zkt^Khgcd1T3I@yO6LQN^(RaqtRivmrhTD#yWX5MXSg0V;nmj&kb`$%~K+&q7RS#`jRRt?_(aFif;{BO2l+#6= z&hT1NPH1H*W{!z|LT}4wzWWfw z$@`OtfQC%6E97@1W>HG?0i)fNK^n!CgNukMH4@^%)5H&`X{Xl z60-!+(?tn>%c1EJn~`n_OS#xyb1avZX3t0l)M)-`YoDrwhSg+G)q%n38k&0R4oJN%XVkCX#n{p~-GQ(n= zc7m`P*G!&PMj@DGczIKtSmhSc2I1@e$a*U577xddTJn?6G<-OX8C83X;`l^1Bt^2f|eu??Y5~xe zlp#nwrSZ<|3q8~ic6$FOwsfxr-J23;F(G?=d3-aW{gAXIT%lfPi_&rP!Tt9Ew(#j) z`vHrf%~on?_-d39aDHLP1FviYT6NehOvSt=^OqbsY#DGkJ$l26_wJliUf;w0srM#z zS9p;Qd~p)_eX*teYoH<^l`qnBpG- z&4n1|n*N)xt=yNOy^!6PD{iew#O4<8bqn%_-&OB0Sl6{7V0z^G9H?<{@{EBYZu?6!3oy&L=>g^j=?x#sBb85 zI)eK5qF#|^u<7W6gw2^7c&FR+XK+pe%h@Pu(ho?NZ*M}=RI`;PX7tm9w<(&%^!MQZ zG%V;GFpd+CZ-`h<$`AtullqsYP~6D>;*OB^*- zV=SY(+3MAqA3lk#Ja?(F%_beX6N@$+)r@vl9xcjtE*i~<;04A?L18)bvu?wFIdqe;l3-7MfCzEeL3<>tL%0Jc?vjx^Gq%;I8 z1UZ;4utmO4tDi=Vji(0Y$N{u6#LOu3dn4o)9X{p{^dD${e<{0yjL12Va7rKuX`hGJs)W-W2!)um|Jn47e%ne}Gb`BQPAdLtKyw z7RX~S5iZhNKN78?x=5~2l|nb63WMn(E6&6GxR!(M2(0A`5>=3sM@gOftSKtx0}6{) zAygLgZQw>M1qsBrYyiVm42rX^we^CbGD#Zg*}}8F=Jj}6BfYL=aG!TjFeCR3W*?o< zoQOamGnur>NmvzP;_rjalP7~w{fUU%%;t)r*cc||8XQLfuf0zk#C5dqBwPawA*fmq ze%Ew3_ZPQu^=$$U!AG#S0oa1cR5R%P-}V+07s9u5_Mb7}z%e%IDCvHQf4CnUkefvg z6+L`cNpP!ACS|C|t)UtP*5{BEA?Ok}a?I+j)54Tr<%l^`C{QL0iTTJ@Xr0=EHfGRH z)qTtwRyo&m%#XVcSF&iU3=}l9QJ*64l3*V-I%+L>#D@?^^4QoixTK~fB7~c{N&gur zf@m_uNYK*-n`S7dAY5vAn2XwdojI(27H9|dE>>!&r?xDg{PM+Ct>3>PsQpST)X9yH zQ_nrLrIpLal|8g^+t9fB<@TQ2(b+NKiJHsP<}tf~FrdEbrdfREbiNI>dr~}ur8tAp z(&Xr@R4S{xrt@P%gF#?VpG{|FJU_pvvYc$_4v@W%%me?$tJRMf`x5+lQ2&$FO!Qps zrGQBs4{a^kfjNl1*rg<%vL=zN^M)AJM5>mP5+u@Rlssu-grI92Sz=zxq|5rqS)v%B z+mu_Yr8i1zl36cw(G+WP49qK1oLxi@_1cmIC9or(fiHfXJK8KP`QE7w< z6{Qvk)=#%q2w3T3p3$zWiO|-=N_#{Jg~Dus*+y*+I^b4ocdB>FPNu&LLd)SY;e^*) z>*-i8=bbnq#hCN5W_R~A9|X%^12uCMdE82B!U ztc;!g@CT;Q6v%%lx=F}H^1F2XT}YIgqVdbs1!%WkrSRc|*U5~cD&?`7fn7RDvgcNl z<_ONAKr-?}EhQodxMk;la1~!@wbUM4B}lWSA_dK7{e#NmJ8rm8a73ukfj+@pnS z9N{p!G-`j#LO;cIb^uoHiGnr+N&4zR{EJ*8a=NNdvSP9`*jY`Wm+NH4EcEa4@Z0VGpJE7a$%UCem58P?XYS!kyeNHkTSt2 zK=FYw{=%0m@=3}Fdo`HVm>FGW=Rd)3Pl54ZZ-K|*k(6#cD+s`3Hguuy9qCzqddkekvTo)&dQ(SX8B(Npm(44tO&i& za8YWW40}o!L>Ks}-EWqW`8D~Ab{j{X#++8}*^qlTf@Euy#nS5X$Cd-G9*KDcrgN5a zrU_?&!ZC=q0byA;S@&_>gj-kbpakdrcA+?U2G~sQH{!@ShSo@bb-R>PX2{wn{g%jP zl|@AEH7EpTU6cYvvq95@53DU^&sqZHB2CkctM^R+ab&K%r&xsMGjU|J&YI)jIyNhT1dpWN*Q$qT?yvd+FHV>s#^_2uF#x6&7T5tW}v7 zTW&VQ`}V+ov$Wo4$4VaUhS%}5QL*3D8?02i^L;%)r zz=(BdG8T!iF$PG!Enm!r0W6>OjWz%6Yb#KGI3waXJ|rNcBTH)LjJOYD44|MRU#jG& zFOd7onO~#}+W_5<<&!nC^fXxPge>8#Qp2GEpRS`59m^-2Cyn5jsiQ^bYHL2t^#jMT zs#U7potsZJ!}FwCG&RmN2{(KzIw~=|gwL|^0%3Sm$+aAn?sVsr$yD2@6w-(Qiey{c z5H~7a5iEI%;$gTaaAn}~uMd62D$qPUTDSM*@X7>%4&PVrS!e(b-j=PTC;Kzc&CU@JCFzbmg z*K=5?kDWpS=~s`z)f@28%JF_xu;;RJS3fbe^8}W|E=PA*yNx0aT;0Bp>Pa(7JYnrTFm3{4t4X+7x1hY{&cGL9dZW`PqyO;?&I3H%#_W4e^O&P=pfWdnI=MiFP^Mt0DYL}2Qm(yWy6 zXb>%o(OijIvanG#?5sRKzTBJ~-dtHpt?{9qQ>)fl>iptNqG|jl_yM>I4Yg0K`)Rg6 zBft>X#BI(HZo^X8jg4cTrWC;?PC&#$NF8Xzx^4x3CtgB;v~J5`q9C=T#Qt{^`A{ocIjZ(f!C~@kvRV*(0vys*`@ybeE%4}b-KCs@1 z3)RqYU8<4LpAG5zU(6IQeG^B?=Jb5z&a4d$*a;I*si-UFfM&AS?1`tLqA_z-v;$~f zkjnQIe9?dVbc$SHe7tmiwRHjo3HffoCHZ+P!{aIOQ(E*=JFwcj&Zswv1@fGc!e$FS0`8&c#O_8|rlZElYu{;_upH@S2Re@d z4GR*w^eBWOiHp{exFz0lrk}<3uu1vzypwJ0`ukLBV{1__jE40K@qX-$>+Q5_^{G%R zGsx7&`-T7QR|&`&^^OUrAT{k|q++}P0^yTO>B5kb6*)YY>+5lk` zxh}k(+Q_cmg2)a<)}HDp&V~k7Sn0;4Im}G0(nwAY`}8QulBgm*L4d@S#c`-ID`b|& z`>5p#+aMep2wL^gpqyEmq0Q54lNjI2<@wmr?ThmcvZx5f$9q`8M^xzzwfwk)Kjd%+ zvDd>`^B01v*KjAPiE zV1H<6tdidf$u$COD0R41Xq%Hj$MVsfz=7aGNCG|5j!IoREN}>OG3ugny=|jy)otx9 z;~uAw#%d@v+7@G+p74h0FI_?$0#=l@2mwYsBMIiXER7DK-k%Vopp17&L5l@+#9jR& zw>0~KG)hD^+YWu)#tSZ$kHtio5%)r9hQs!NN`&kv(Z4L}zh^CywHWt=KI z(S*AE{e4K+qgI0F*3q23ILy%gd?~CBX$+?J6dI=X@ebYTZRT&$YV!HK`832tr`(*w z3KMS=+dx+Gam`caMBK*BLG_nku67E$xx}8U*MxeC-xMzg*&m@F)<^U7HL?WG;m={) zu%@kp);I?!jWoFQLm2;wrxGWH^qBK0c4bx25JUqQ{|7f;u5eRo*DCzH4$g?Gl({56 zvj8%5K!|%2$qv;16w^Uitc+%s%oVBUci8!Ug0^vGL^B7yav(?Sw{##~`>tEOSkAjV zpiykKn>)@G&NDzTpQ}DJbc@nMzrZS2E9W?gX32sNPQMxa3*}U_r)&>YO|7UOly~oU z#dIE;Lw<}=BQ09P%&6lAT@omxFSU_ZV=wg)G$SseLmCWGH6)$PlzYDkH?AE@hI( z4I@U4qcoWwUEyYqVnoJw*oh{7n?&*VueR@8hMc#RE(a&2o$)yz_im|1jBs@lTt<|g z)Nz1!`}C`XdR^r_fBi{g2jywfBQ&E1QDp={sSvAn=@OLKN!m%BRd6i>}qQa+e9PqeX_(`OF;I zXAe!CEPOdV!Nw0}!VGIBe2A`b8<*%Qo~zQ__sy3~Sw3#ozrHp+Oxtc%m>&U#%V^l+ z9lUV39W;5iM6N1r(c{T~Qz!prYVA4+eTRi-a_~)$AR2gIpp7@o@J~*TESez7iY@L8 z8`#^u^r>l>w%7Jxe-6~M`Oyf$={XW_mfqWSvp5Q+=PaB#_(?5venL~53RFrL&zn51 z*@M6Es)pA)!QNjEPf34ZB2@tS86j3>uiR!S3~{@n5E)-2@w zPnAhb|9?q?80lI6gEL4wVbUg$4qo{62a+~X*vJsioG2&>5x@RSu=xFt0o;-Teaa!# z+h;^NR}sx0952(Gl738c)TJ$U8kn~zu6OO@*@K`-OHP*Z3a*I`|Ce>SCePFb9Bja(HCn+V{ zY~NdvJ-|tpXaF?pEY{KrZ;BpMI4ykABoSFqnw~{?4%R=EMM(`u!rH6O<9+GAr<@fW zuOFpQWipaFV&z8RN=V9L;vq>SViuuE#zI9?V(H{XT_hE8EuK<#j~hGLx%|5FB;wUN zWwh=USsRJY9mRM_c(UWcN$WZP-d%Qu_$hK?XP$(6TQAq7f^3*FTR%1g_`n2g3xgWX zr#)r<|g!&^a=U-3WMf&PD}5MyRy`acwiF)}kU{D)bpRlV&rmy>_9U-8FX zti)pst&qQR9^wpQi^gyv;O(x^E`hKFzodyv`xOjGWm4849UA(Q3_|shgtP?oH>U;w zCtc9Nq1qW>R5VZlr9MXiL0BgjZ2XPZlWfqs#&6(D8sW`7bB~tv^8{Tza=v%idmU$P zb3AXlZ`bku7^ek=sj24n-fPD4dST|d0t~fWMUmdYuLE!Z=x zn4gb#Qw0%)ei>eYP=n5Cd#md`g z>>mI)V%V1@0aC0COpjcM887HY3zpAY zXY!k~7j+p=q8CBO4|>d(kQLx)jCYvb3T1-2LCzorm`1^e#y2kH*&Cp&OtjUDU{g@~ z1T`ZZAU4MfEUqS{)Xg;-lS}uq1L*gSx%N@lMV=MUD{3#brvF#4h{&UesN_e>OX~>B zel{1~as*FTJv=w{@J*ecbjc6J2>_K#eNMPZPn17ZF0v(n#*lprWE+^(@3I~zlRiKb zeiXRH5Zmxf5Zh2}Ae1HQ&;XfE>kOcbo64uGNs>)yUBjn3l8aXXyNreBwD zn62ff{{DFNc-20XE7-Vgx=ptWsa6E!2XXBF^|-+E1(CZ;4=NtDEI!X7hsuT*zfV!3GUIm83e@eLxzM>xx`^o+}ge>rW4Dx0-VmA3kBDnY$yevAfxRikheIYnkj9}X(T-le^3y8af+uxV1Qn$af=s0ai z*~PyFfqrt*s2XaEXlPj{F7}~!U3gi(fBL@HLOu~nbpZ#O+J2xG*|TtFV-ljk z?7(s5;BE#Z-B>)SK7qD^2nRON`+oNzfQ>o1jlB>8{OG9Q_<}Gh7l3^qqtrF%chgwL zZ&_x;1|r@?iiBY&v2*A%n`Sdj!w#igO1F=jI0FMRy0c)7MUVY|y*bFb0RXt$V~K@- z{-ziZZ~fb_sb}RctHfB0EN|m5XSNh?$E<#=emvS`vi0;`^%CS(1~)FMRudk`SU#p+ zT2d1{ez;H$Xep3VQ;{cRVr~IXaM;$Kyqr9|jd&sy6F;0HQ zTG46Gzen6Z_MPx~ayoUEU2It0DUr+p2hJJ$dAaWOVQ!z`y}tl@?QQ7n@p+~Ex@1y* zdq6IfRi#N)o#a=43R?}!E>Kf4!ayRTj9$znLQD^mx@oharLjf}N+50^%8)|M;6u?k z#nQ~&F>6X))aWfOAMY(6XW2coEgLP7mbT1tafh*;Z6M!B@=iHWlJaN2KSsL!{%E{m zhxav{Byoo%K3iT@6K04ssV{3&qOPW0mK1I`i}R%cQ40w#4R^NG%SZyS$UdYL#m{6Gb{8Dn$U-9>)xnkv3N zo;Z#J6)BT_Ih$qJ+i$x4UsuC{ZjH-KO4RRbY4b7Rt(1V0E%Fp$hPLc0<}E8zLw8Xe zA$^2@DVQmLzaEb#!gD*5*S7lG4;=}7>;9cfPmZK+-W(m5Q-c}X4|%gFX<#%NmC7~G zi_x|tehxdH^}k0cRYDk#mYPpj$;aeOTGaZf^i{H#O${5v)zf)lXL^`DT~ub`DpFf& z744W@l-#b)G)1GMRb85jsI0OvG7uIFdQL2w6qq~8FbQr7E#RFhRxG!j^^e1T#V+GT zW5reoi*U&+p%2rC7!zm1N^{shqwb#enJ4ybS{V)II+>2a~AMzvQ0a9?;#kRt~u-p?nE>h4k#$ zp184TWthKKN!Ye2rs0nP4u|+4;{Q!}qr;CPALhjcukh+!r}ksK6|< zslcRT4F@8jFCFUKmar<5$GC8zWQA#=+=Z@HVu9VMfDXY*Dvmw|Vh9IPC}g9fNl^Yr zB9zaw@M+Pl-(71WtahuZfpT+QuI$l~LbxL4&{1+$}jK*RF%+LXNXx?&Q2`&f*~58fS=hjBVV+J-62@Vt|Zz`xF3gX zS?N6HQL5aa@x4NG+4jtHDQ-DqF;i$-wZLWu&ZI8RV0@<6(K~y1K9%O!ojtYH9Y61L za&hR|Z5o&U!Uzp*6IB%6gVgQmfoYxU+%VDu9;uCbZ8YeOP~vFZKs#WST1+{QDlJ1q z>OxA&<(}Ua0)m&mH7yV8jw%4nmY$WC#>Yc^EBz48YFqZg)r#>e0U^k=khVLWl zV9;thkayre4)OPJUqhb`9j}x=C}G@a`Sw?DS#W-g&sW!0)^#-mtLC<=rto{ijtt#Q zv$nOR%ChI=qFGhjeK;e;Vm@zztRzIuZPtte8S0}eiH023 z%s<{B7w!BY!4H+}-o4+VSt!Fob4)I1V{X|TAVq3beQ*J2l30x#`vIf^Q^bntnDx-PFvLBW`)y zr(@z2CYr7m6`CH-^!losPUaLhgovcmDkFmfOoZP0=P)wf4S>Gg>a=gU4mGA&L0=lC z)S%prgtc8=(OQF|+en-OA)*|58$sk=>PGaHL+y<(Y2BdE$yvW)e@CXJ$<#0iqNa`m z)3hKmyhdf(J$cYPmS)JwHyLXRTaRQ%u{2TkYbPpZnx{5V`bFrWC1N@TaNkWDMH#t z#an0PW}MdBH|U(7dFaVaKU(12!&eMy*f7dqGjQaAzBfe=MPoDg`N%l9*5A>bJKoT2B3q(X1!tC7pSghluNFoa-8_zja7Zfy@d>(# zg;gb{JzM%;W8_$i_;|8ivyABFYzMY>A#WWL;19xVZw=E~y`u^Ne0Q+YM6G<vEXHH`9_7u9WpMU7_~{FodJbkL zM5-bCPXdK>+fu$yz|8Q>C=z-!PU zPxax~F3fq6rZScahHlP*(~(U>r}jDM!qed9i{3GfC92%i{`Ga={=I1<{a8bL$MxnA z?|!S_fKJ2y>JlmqQ{9L}OArCN{^kC@);gc$ZT&1CpPXhpEk$b!Hs2gYbL~@8_axlS zq!s$Cg%5&h5hqjIBiPYvuPO&81ykS5*-X^Aloc~yS&h^dDiN zcHC^lARhe4EzdCarl5Qx>I-~O0^*qdZ{0eN-@LfWS{b_I*d}cT3zVnbkzo%m|SPc?xQNtwk9u(^!^oEIWce(J!x94*e6z`=h)7>Q1L4ie4a1g|ENgj|E9?Q z2av^1$Hw-*3$n4h?*!-a&vI^{<(JF8=v{f4gsps5qz4y-^-rDSEirq8kA*9 z=cNc9k>2=>xu-<@cN?R(uCp^&>Xz5ds~Vo%FFWi&fO^B=z(J_4Ze6=ykB6239^iat z#~*icU#3+kdYFEMl&o-*nY%b%kaRd;YZ1zVm8+sVa!h#V%6(Kdu%aPxF>HTuf$C~? zTsJFlqk_=@%6Y-6WMQ5{KWBbbm5Fk1*y6h8|%pcI# zWjKHmQW|dPJEVG5gWLPqAE;S}K!KRpfYO@}?b_-WMKM&vz2fOtTiPyo&*O!` zFTzD6yuprf(LK^NgfPFiurnYs+Xqc>LlnuDuqmq;Ln)&y-dp#oKi#uvswC5<(6jH~jkL)2wdgA_52S`* z?E#woYTD*VU8bIhUv;P7!In`o+%)7uXPIW--iGvAFle~*c=7BAJqos`HB^4kPzaf% z0&L~9NOv4v`CIA*0+>ATp-oQ;ho=3Yh$Zp z)6ZT4#-4|L(m5v(XrBO|pH_GBG1<6V$gF!Lr{gc_FJO^h_HN{~jKx~tnihapsea90 z|7+VXm=EqR;D|SfrEkWTFCZLm)jrO52&*2z-7n&uYhO_rz(0%~&(QlHK2F%R()~eR zW{PxyAYdhW0FQMpkSApPgUD0_Xu^th5W;_IhB2%7cv%Rd^sD;uSoGrO;R>)}(1$pl z_B{d#8JKAbAyLYPtd#UlR3+kQC;uI3vEV8NxL9+hc9UeGU$V0~KR=&e(da|)HiIHn z{T9%AYh{tiWWmt16fl{_g$y5fh3j$hpcvF^e#vEiQSWM@&T+MI@bq;)LeSNj(d{T9 z$JAzixRP|ZlJfz>ZLkb0=wJF{=qy%Cs~vy!t%juI=);tKWqaQ!9_%~ML@{d^KPnmZ zVHND5;1_2LDL>8{)^dM3*FBCI+e;NuQTAWmLeY0fVoL4)`i{Wa!`tp1C2uK7AtKBm zjbdo^ZN8nWN)}T|5TyeAK;P~OegSv(B+y3OA`W)0iMSr9Z%A*4$x--3Yr7>l+@Xuq zh2Ng?l8|QS%i6`AD4=W(vc#d>)^paRpnT7+k-DMf&e3#5?#>H#93S8apfa(*X|BJj zk^TUo!5eU{c?{iAOaqv)x9s(k10Zbu5I;uPp}iFRT6CO=5@;fS0_6%wASHx z*5!I{jXBA%Dv5g{(xL#D49JyZ+*yob&iqBSQ^zXNwsM0z(gUgK;JPV|Et@!Kl zOB1|P`YK~8d7DEMFSFde_@2p;|2W6_)B>M4^G)yq)p8pxXj`Z1qd{5DD?JX|^>t_^ z{%tQ|qL-Bw>Z?Q{dfQ{*%h;YPUWM&U!(k5?`-cp@D$F$IqLy#QxR}>G)6He zBgG8L>v#O$)A6~)(p;`zBHsbeyhX>Pi2B5c`q&8ju!v=W7@!xR2mgE^uirvqh0UHh zo|m;9DRG~sYk8Ii*VUaz&)|u76^bBfWKf@}v(T{^ZO~_DpjK&n9D>danzg@W zzJBX2iPo*AuHQG*ov+tCt(#0T;}$zF2Isz8x>VO2I#k)4{4LiDE+wux*Pktpn+Ohk z1Yl}cZJ4|klCV@e65YE6_7f_KHspycq7p^Y@l@t)3Bokqh#m;SA*8cIi#W z=*5Pl#fG@Wf_@U_p^Xkehe;JGpc6&;eyW(zZpASEX-U%Q+3G!0GL5olOc3csKqw0-Di$zC;)h1w9C8HZhe>5SNqR6;&_@!d z8V1x=)xjEo5DMOp2PP}nVyURW3PhTEl6zWssC(_+?Ahl#O)_BX9M?LUq=ZfhXxRzX zJ1dr(28S$}qb+MJW}D`m#*yn&&n?s~jx74{P3TMsG6d0!HTA_fHm;X#>@7vgk(Qb@ z^DNWL`L7n9ttJ~K`Hi0Lc2Q@X+OF4Y{vll|)0+rO0`A(?8oHu#iq3jPgk|fGBMb9$ zC>Sx&whmZuFrC|jH)D01*CC#3c*4*fuzH9&BhmO$1@U2=0b_MMe=%Z&2nGi;S*!`N z^9VBu<}kM6!!n&vr5@V1BWtX+*A}|)_G~Sz(rQ;`D=V`(OHVS^8!IX)xKWOciHC)21d98}jY~ML zSS%F;yt0O^5Zn8}jeUsW6PgR+Mk13=P8@Jl2O3n6djjAWj_RsFYEI^l;`&K+r^`c( zt(Q~Tmc?bB8A#WUHI`CR$9;rXWSGbw*qmFzP}Dc)i=4(cf&ge4^mL5y#HA%t7u9u0 zCV!^3EOCA_7CBtl$Fh36`C+Sdg=gOE!l2Z98$b~PbM)0j}8%=UMfq6iGRS{cG{whO~%39=CK)# zh~uo6J%`$bUkb8Qbzd*r!^PRIVBgXxUnolUxt209~sE`XFb5BuB#4m4=X&m!kPBo$L2Q1domZMP{eN#Ds|*6;UB;(8j=#l)}b&2sMJS z=8qmYtVg?JG}ryRltzgM7sT9O3{m4T>@FR%IL|!2_|h7Z zpN_?xc%xXVsQcD^NTHIH%HACRvPVQ?C_JNI_SZFaMBroW{ye@(#@ooqrW16oHNNsHRs(ny+WLF%K5 zlGLnAESJH875);Jw3oy|gjwfEEX5MJeWY;mp7LjIY4^lm>fcCOIwj1Vq=LOnW#!Th zx{8B}^bfJh9$yi>*b8J$CF9_^C-NapO2zvQT+WSOmK*^iN#*7m7KI^m6Xog|qq{|x zDc^JpxJ?r#RAqb3vVvY&KF+R_}39)@yA5nqM}TCsoF@|4t$3+#rZ0SG}BiNW+tiHjM56vO4en`fk}bpHq}o}DioienI&&d zUljQTIWp1Hiwja574l{E?4#?)SzZ&9X(QSXNe0qR>Q1FBvTzVI>W4IuBU;hZpyIE_ zB{`1{a_6cYNdW4Kn*kpiPOs=XISwKFkL7CA-H@FF>c_^IS=ExXgdL2{ew?QgrxsAq zUkn>x#rrAe`tC~wO)gOv1Igt6b7_^rfjI*g%rV%7AI}~tfcnBMlm6o#e{mF*yHmiP z8gP^~K17mZ^s4)}JBZvoXI7oG$IXgYieZk84WVHSA+0TO70oq8UGLmE@oY6Qxu(LH zLoho=Ak8)ZDH=l#hGMatLtfMfXP%fE;;#v6An&O=3Oc<=J3rlB@*!}bNqPx_N=&18 zhIhtDokN!Ff>K}!@3HaE6hN`^Ztu54?%!NnCzz;^jwfu$61yv0jrk8h&wqQz;6K#_ALotI|$PThB=M(h!+0usOH@N9TJkzw?HMbEV< zN|G)^lKbn8LBPtiI|He?@d6nhD;9U2N^U8R?Nd7rEt$~9@bP&84>Arsm!Id)f2-QG zpO%ns&%cMqEvX}3xWafGZRJB~F? z+%Y9&??@vLg|*LBv~kRDS$L4b$so}o)%cKf3yPW2;^Ev{S+=A;L_~(G)7NmKn9&$s zwA3}K5g_)mTGH4~Q-b!9MaKO{DJ5lvU^#BbznRxfkmmo2N&>g*&B}!3|8daCt89*> z1O%i+ThAK@+U9n@WoP@o->n7?-e!M0UAD;1_H;rB^_(&2C+2@*G7EC`PRh7&G9b>3 zj+SQQ_S0O2{t@hM#!#(zbtl0Nfly9nk}b=WP zwf=s(@*E88{kfy%>~*}q+6=@qm7vBYFZ45X}R|7u5rbLL>`avw-)U z1BLht&RGu^il-ES$0gy(e&w5mCo4NUjij8+Jt9ahL+}DT9o>WI?RVYYgeoNcR96>$ z?e}Ms5%3NWhyNScol7}c=~}9gWJL&0w=J(%bqo13jTWghKER-@UWVF629~4|BzzKa z9dMEX%F={?WM(>Vv6!jQNyMCorKlw3)HZDOw|_EL(kez*OMp01(o<44C5Iw9Iik2& zVx?}n3OZ7~EVqnxz~#W?q0QNhpuJ?|u>Ji+e33j^U7WlOPlcP&{qK9%dlJ=5m7auu z^11)B`zgI@Gi@}jX4CU}vt@33qx1guA(qPTruM1`?^wnHmKv&qWTOUC+jV?!Zr z_xR6h{aJ2LYu&LYvg;R$#MbUy9k0iJ^1LZ)lB$)P&F1zjtOr-%tj6yH+Bf$LSs!bp z50POyhv}UCL_ap{C#XsNSHhS`w^LZmyf|Wbl?zFu#LCD^a~*^NbyD`X~kgr%b)G!1)s%~kYGZ~%|`MZ zOkWbOZlivw^Iy~@>Svv;A4WfUybnLRz*UUr7NzQHMp}l~G?3VI#3^T# z3gyu+i}oJbQo!1^Kl9rLzSEk@=>?_Q&5YH#P-wLF{Az9&OU1h1C#)V+ANHKMd!%{; z*0X755cHU(*vqhr6jH$Pi-`W=Qgh*qZD|w2{)~ykbrVDkBx6}tS!~OO@SOJGqjrcS zeT#?7hw_T2_wQtji;oT8%(dU4n(do1BEJpy|5jlaKJ66_F3#A>iG{jyyqq(^o4c#J ziCN)0sN5XypTNBMANnE`H!8}NebO()TLY<)B}(#4>*(p%l4Ams`ZB@m1h9UXb-+@r z@aL7nkYTHlACLnrK2Yi-hP9P1Y?h6~M$P|taqZ>#7%mvQo_jE59ilr&S@)Nrw(?6A zWgVtfwgo=#4>cLD73GwT$suI=#bp|m>7Zot1@=e_Y`5XNPeRUW&x9d)h7qsG?=R0b zobSnBfHly)Ri1dUZk08m3SAdiAvKk2b>3h9eAyR*I>-fNE@ip-54kl^if>0MdMJV_ z1|=ao-OesMsMqLR-$S&VY;9qrOSFoMG9Fe-2kTWWdgJ*&eTS@e2y|BtI zojKo1*Q9Upd6Hl=D<{uE?M@(<4xUe&nwOI}^2|>Wu(EsYsFO2Nre$Kw+;WZJ_PQK} zW6^f^C4lyH*5Ax{_WU|M-P{QCdKP+{d-6+Lp6iB7XqZeojb40uZWZ)8osT`o|E8yU z`NdFMhX2%2Q%pm*B^DNr-voQ8*n+hEtS}*9q}7@flgL?c9fKzgf^R!07$ZtbOh{yF zsTK6pCFwNAZE!ZHm2XS#e%=p{Ft`7)UogmZMeXgr&fm3)r72~O)qY!@Im$+Ve+;1r`2vj@9LAJNLA#;Q4&9eeUuF+M^WW6(XArD^NcR**|VOGKs@`8kplQ z?x;}UEy*^wm7~Xk|3-Vwo>wgvz2g<)wHh0#ws66nhn||>y_l_=uCH!}*pg8BM1&t1 zuQJ)JDrcco#|LN8)M~Y=ZDQ=?W)(a;>DW19vt(>-*olHJdW{H(zWnIcAxzL6sy7|D z6rV|(8n2L8$lfOsW|ck}QAZa|{l>Qn?{TF2p!^7;GRmm8Q^t7~x>dv+c!V``-I)lI zOcnaIF{rY6=Vwv-6EL{cHH z1A$O#+B&Z2a|pF@l?bR_-%r!5=c2WFN1c>`YE*l0@{s9jW}lVEPpKi3i!3WE^CJxz zj%SOAL{gNyeSBIYCHXwU^mg`exu_eh)-@_&l_Hdff{fIDFTxcWDc8lnNQKK~FP$4u0OBZse|11;&ct>}A|>{a{pm>VAse|A5ua zX!V}1y2y9VH(|Ag;c3&~k=1e7ZsTb$s&`1~<(f?)N}ksKhyER&$$6+pdq~-hJI=}0 zRp%v~%{Zc})#LzP@bx_CjN%qoHchJ{Vz@M@SKT80i#QujKTo9aipM$nZy_K{JJRPV z?4R)LnNi;^)wOk4BK29osGR(fzE|Z^Qr|1$i=sWr>GpQO^OwgF6xzE!_8rp)=RrOq zFQcm>O$BAHLTF-QrUmS!bF7$CL!yqQQ|m@x6i~eiIXx{o`NoWvN-A~Z@cP_ZEO!UC zgeGip(|`3avx=lXFF@LZDfc*jk?qxV8nrq=PS4oHG`Nq&=41Y>pk{kiCA#Y^dP5RK zVH?I7inT;R>mgwO7stMJHI>dI8Ev(>^uzVb;B+U~*HdQoM^TenI+LUF0*iqHG-Vmw z(EZul|D6&u%VYZ**e&yof@n$^M@tGtT?>zACq4TwTRK0+M?#f-_%ufPSeRs>ceq|I z*D-0Wf5-C(3*Sg^a4~yU%sFM`jaaBX{+G?PVrx~KVLUdsCACGxi}MmkA+7={5;_EB z4}B3VH*!}~P<8DsRNDk6!H*T| zkb9T*()T}bF?|mqzzItvOqaA6V-(Xr<_v;@U4brG!&5u<#jItKDDCALCj86+G}?vk zdDk_q?Xp7A*oHz;U#|zEGstP?3dPmW;0@ZQj7_RGfx;SFh3(X4?eg=p`NFR2vYw9X zd1Ege-6#UBKvZULyhIy`d>+bY;&^X<<>f4cL`FHfc6?$MH01Rx_H|=x^YeqLmY{Jy zDYs~>_n=n?vL}IQv<0t6U_t_sX_YzyVrFjU?J!w+dTDLaN>kMH`~uPRa40U*{)C%| z9qs*!2^6@eRM%Ek6NZAbP!bXM;YU_MKD7bP;+d|JX zeZ&<+c@FT0s;lXBSJT_^ll504?yTVe^;VPm45|qRV`BM*lc7$eS0c#o@T9Z^g9MPv zDgg?rB7x3Gt(pT&tafXo#tHIua&`3@fz0KUBr&nYR!iIN+0)AVr8jJzhc^;hgr1z7 zqbSVK+S1Y0!q-IIf0q>!r3A<*XegOeDhyB&dy?a&P3n!O^H<0?Q)07e?3OMOv5!!k z*CaAS!9j1^zHT>Cq2%F_1_AFqxHhcBb~Du{$G7vDjE&az_A5h(c7-Gm3B)4Ia>%43e^_$F9Gp9jv{nXV0w`V zdV5U$ibZaF@Kjp>yQ62@8cp9SiJdHhg}7(BtaO81g6i?dMFh(6`abd~(jr43!x+bC&VdaH+DqVQ>cg}T5k}0tWOm8S zUx0*3$L(zt>b5csBw@|_OqLOEscjbM05Z)q>wfKcn{qC(i`MWa+-&MR#oy4~XG+8%?U&=C?&u#1j9%Z+*>j2a4Ud8vJiVf_8ulS@32E24 zz;Ovr8;PRZEHa`Z=q*^_C`QcIa&>yT+P>^s+jWg;ZG3t^KMyZ{OdMN%+&(T29xm*i z8`XJDXm_fAe{X(%P7-_q8+3KPJac|EUtVtZA5*U+wST&adcBgnKfYf-mv>cqd)ha( z6tuhtO}-~1S$BQiUSD(@KRbP^8g5luf7Kd8c57{Tc4~8SYJN`bzs;S;OP*j3ey4YK zzh0*gvoEGVf6wi|?;o!(=sn%LyLv>*1i+Jnyd{DnKt6j6aznf!Uu<7m1Ip`-awE#% zI#uZx2W!`MYmQ5+jBtNk=Tj3@e zS|WMT%E%DN9}4);QG5l?bk}~@gm$ZWJ2Jn(J`X=TlqErj z`mz8jHt0y`5)zRU`y3WFe}>FPd5mj-Lz3v8f*jfRC7KZo4VF#FM=Pl`HtCsYqe#X# zz5cl?JIILjv}Jo|Z+4@h4)V2GxJwoH49=HDNAyv@XzXs`s=`T{H<{&Z7OMP82;D{y z2G;)Tv)9|VKnu!<2$8gYnj3KY@KL!{5PCyZlD3^sa6i&GX=`o@RoK;KGDs$;$q*gP zSmL4?AX8ZZQbE3Z>^Vu1Y9#1ttQxpZ@O9{-)%dD#>dhIXZ{N2Ya^ynuhgP{SYiDrP zYS(jt$(tpHI#j@u1Gy-^WKWF}kQ=3|<8|fg#LF#$8Te-3#m;h=Q)PDc*`K3gP0C&H zwH#?(U@Gg#Wg2u)Yt41nxe+vpTlyTqE(4TEhhx$S7mj&MOY5kr^oaydNF1k@da3C4VZoI(YIZb-Z$!%s88W4-wVFvc z->r!OeG=|&om416U5~3Ri%P>poy*kg7mpdPkrpF0bbHlt)}Y{*tI0)PqD$`f^B0e7 z@9Klp5eiia++gO?Vt?bZKvM3XWjc+m5Re#37xbwiAR>~_;hA0uK{QVWy6Yh^ns8ZH ztrS$8+V50OpI~jUD_>9T0_vNmhmNKV)z9m8Zn=e%(q>SP5ppO&S%43EjuVfqCC+Nt+8enX%sYt!{CAV4)6j;Px{l6?cS>hW(dfUxr%?$?$I}1H+l=~Gt6;iyI_ur zZ+*)lpOtj-Qf8C34c$Vfl=3AnAx?{7IihRTty(H|;5ty} z>BZ3XkG*6$?ctI_>!r9~7q~9Ml&EV?eXH}}tCd&luf|h@O;cTLF8Eq`Pi?-O7O@MH z+no7Tt=ra1q~;ZE+!#e4+Mz>T7l$y7cy-f4Q_SEn;@!>YRH2l`9J8wv%x}^Vewo5cg%oZ)_bh9JvCn)EB2mI zjr4~4PpB?~`aQ5MU^ow(;5O#>rAPQPPwX?o>E8j*9kA|vyIaFc**b(}(LjnHOxvGf z*af3%L07efF1_OPwGsOqxLb*)Ro($q(=X)?TbD=}qnA(`BqCi9Iruy--+8jyXmv_o zTKO-dluewkt5nD5=d;p(`nd=qv%_h8_?5{(D`J ztx+VSjf$5J2aGU1YW6A}L;hLB#?$E*Jor8J^(%rkC?f>Uo>8d0$ow@R?JN3@)S^59 z?J5123LBN(vk6JA=RI3HR+;r5xD;@khlD~(9vrZb;LlHdxNMzCIPC?4m4x! zd(eyI$68-oh`gcJn_|rp z6}wj%V;31$yJT(;gI*=GPqnmLEI%Nl^{ClYms7y%|H}o5b*Zo&8_tJkl<#l*I^7D` zVPIq7PAlN`hLx(szCS>#+_0;_P1g~!jD0~(*QI~0q6^LBZw#3odOeqil-CXb^1mC; zw`d&TXvJDk4%OEH@75p==w)dAG+}*axD{4`xkCQL7Vpt#-YO0nxU9oV1uqD|H2{LDVyltB_2A7!VIl$j>whRt2fCD_1R@20vpKzet}o zXPv%y<<6)^*I#8VHA#-C5?U^m59w8Jk{i&)#KO?OeHqZ{dcI-1bsfJb_9R?(h4mM6 zp^>bVs9J2FGU-Tmp8B3Tp4g{tw&!)%Jwe^OKD=>?Ty5xl&J?KMH>c*fVb7w_u0j}6 zC0TbIv>iv5d^fMm5qN*4Kx3coOH?~r+m5^g*ve!%kj#>5vZcCq)aT^s-*PU_lD>b0 z7;y8~fkD^%WU0k|cZ8_HwZ9mB_9k0YI*G%0uH~qANz69m$MWQ_aWthw&%dzlc?_#Y z>D{6oT(aFSr-o(=cr~v)D{|lZJed+&x4YN%w{u~oKXCP_trGq02${uo7pdT1_?^Ox5hL!IB&%+ z&W$Hj1L*2f@QeGb=O<2scStv^*Kl|@4rYe?M}Qj80zod)tkw1CamI8#9~D5&jEYJr z|J(wBhzG4t0K85P5_N*mMQC39cc*WDHo~dTO9R#PSrh0(`xRzm{|Y{uWU96@wy&R6Pz8BUUV03{^=t#r4eCKz;cr4*6}hh5TLf- z7JX_P713Jg&1d&@H4&IojC~OKPWFvDfbhcMtWi6@8Pz9Juaq(Izith_4^cUG@I?h? z!8&vtWzpc{24dGtKbEm;6@BMBu+`G?^)QqF1b9{Qz65$f4byY<))2VBZ)T#uO@SY2 z5}r3CS%aQ4wDA6UM?KOUjq#`1Y2ubiwr#HqdZ~|hES6~xONb9!^sq6>1(fy-F`Hs! zL}$16O)7DZ5iu+R+1UU6*W&S1rm`OJW*!Qs>OehOEh817lYqVMBIHx(BQZQyq9NhSC6)nUz#4cx zg27?Fp`idx=98x&JQ6-h3<8p%OVKLVTCU`5edox}5;sSb%d<^X%dI(;H193?YUtG} z$ZLw5C(`TZ$FxzbIigg7L0BJA@&Hi?b#$;eM|P_78D+=lMcgm_t6I0uS^4W<@9I8) zzxuv)xG9CGa4lNR-2Fd$Of7L{gj=Pc*Kh}6I72T)x*m#SMY{9Zy4% zjSIxEN78k?Y(CIV_a)c%6<)Pgv5N=c1?$J2r#X`RH}P|$$v{-Dk73We!C!}Rr%2sj z{5JsYp_8!quVIJE$*Ym45vgXMQT1%KW4|T$RUSK&?z~SkA@D29PI$KIJFsCNNfWRV zOT3}-mv4C80HnoA7z?Bdz+L(5cp} zE{fjUPnp=GZ>g?N;LlH^)40$2oxp_y+q|wW%W)}(9r%!dE;Rm~%YQSn=iz&d&hH4@ zYBU#5N*V2#zUV@EMrFBTb*A3;K-=yn=+(P(Gr*WD{w~gXs!94Ss70{jla?$ZoT z9kHHdYBo@d)L@F>&THNbOW)`7j8N(`pEe*1uoYW_>=YsvXwGT|W6M(QZa#A4a7UF~cAEV>j z#4bM6hui&sM0mf&G(x+I1tkB2gACZ3Pw1B2ZdYN;KqE z7M&eN^Y5o9mMVu8xUn|ILoaEi);wzdz9!TL`|krS3`uFo9WAET%-gWN2=k9y%KI^Q#&%XC}&)ztTgcI$@nQPcg zI(NZu$(#g}`w$s(;T;h>t6}b67QEiWV&oC@{G1{TVM|gVo4oK)sAaAlscqvfG~DJx12$aziC-J3au%0~7*m+LsHg z`V$0&$5_b6Sn^2-Jik-T@MAJW<=AL$ZpCBMA0@Lfl#z(vZKz#$L`O)l-@HhQdb1SB z7yJhA8t>}H#jUdqgZ-iQ4ecAa)9IDmnM#!SUZ+fi^kCs>eb@Pvf4r!L$Fb@>YUhZb zx>Ae4ePTLp5C5$v>4}|gTe>sjeA9xZ_Dd_rX|TgB)rAgcE1W$v}1=LUXNp4iAcY97%qIN&X7Hm9i{!*#ElvHG9bniNd!;{6f z_({?UK7Zy&JGoiUMNT|ln$`P;HeBeIX%C;&A`pqx4I-Nv#Ho08n7vCDirq|YVPhZL zH+;%hD$QZ0bl#lv%tVc@!k2MXI>uk>BZC*kiG7z;pzPD3?d{@uEliBQgLH^}*7>rM z6K-xY#4<^(3diDJ3f}Eswhnc#jc8}L=3^tOBp55We0m(&&7I7tlmisDpdvFp68Ro0 ztvLHgm9yh~D@V-ZxoGe$aRw=XNYKvTCj~R%mu*$O@i~*<7(+w3rfkHbFXpyK^>Z0? zYhs6C(l9p5SvOSEuHbQ3bNRtewT%y`Lvlx(xjn2TJ^ma`jHRrSJ)l43dau_a@C!4E z$`@TA-O!1~H?8p74W2nW78whUz1K#kryXlGMyADE)Ee=dz-g6ji^8;K^P_sH%sQUA zLM*^(jt$$zdz29$rz6}FaM%6L?Is0YqrBzTc&;A;cja@)Hb@V{5~_-tCWN;r%Vf8) zZ~dUhr&O#%+6qB#wt#!()zuf)OOk5LZ7RlP#UHfaL1TS!^ngI=i5MSK<|9xrQ25^9 zuk?#=v~*&MHjR^KgoD3+P#Z~=0fjF{pPu&~lG5#wZoDL0YBn%?=ochciO~VUnyUu} zC0B0gUoY6d^pu7_ZSlb7U!9mVJ#k0M#)i=3TlHx7pU4&PfL&(>j#WQ*Kn8h>Y z*Igi#FNe@C@pBo_w=u&i&4pW|d8=*u&!%;ufZ&UJhXo0Nj;WYoA`gHVY1oofd>>O5 ze#ntr!;l$^7^&^d2-N|f$ewh>ksKkE#25bFdyZxz_B-AnIZJzJi4PeTO^iOgI;#>W z)~w*}JwJ2${@}v}B-W_SqXZ4h$p|)qwtd$8hj}o;g>L#59P64(x1K#Kll|>Q`sgo? z$?o{m%SSJWM>xhH`C!T5T<6Q}nJ9Ccee&VV7olrzs$Yf;{x=rV<+8F16*)pzX}OG(jVPnc z9ogyJhm+Im5#wo^UYJglXg(&1lUqqqh>z- zqP*h6Cvu~h=z(o?r}A+VCvC!;wqjaCBfIBYsb3CS#(P^bC$|M+_`0r4kxI`+?l4_P zCA^S^57)a{F#3?{N zS9viL^>j3TF$eo~Xd9t^8&+1H5KxGSFref_vjM%`o{_QNm9O@)y_0B}II3`-MWq6> zqulh!5I)f`d7)fHfq>}sa~Az6QXqPLob?dB0bDWioii5Fk?M-p9BYK6sAtqb$F&5r zA6FcG^t#XW%sfnyc-eq4#PGw4T4-`aBda!>A#EWYY01X{;cbR*=5gLebGktT$p@`1 z)@;nmV}dlB+@q|gu1wW0Io{o*EhGeSXCaLeDU}HQRXj7mx`qA1-u+Z*fp%Lj9;kHf z*E7eU^K`b~_8zKo`TjESrPA_^en++AUhPnR(#a_BiXwJkUtos-tjV>iMglIZ5r4mtui3UWAw3+tC9o5CmF-B-?5Qz9?@F_v6PAks-( zoP&6*#<>|Tkwnd5_M+z=PoyqjJ&OnZo=4gI5wQ5TTGLW)vu;Iw0{f5tNcIA{9*q|i zPV(bgH>J((LvkL(NV=6G6tZ5ZB&SnS7QgJ#)O4`4Gbc0BfXN8*Q5(PW@A*t|zrZ@< zL(G#qk6l*SAjY{oDPhPl#9HciiU&woy)HL_jYU%)wZUAST9H+P1H(WE&0JwV;R8O6 zyVB|kccm49v~(iVehDRDUewM`)tBp|)zs9?h};8MZOfH$8l-7 zH7yi(Nq;9-#?AAs>7?av%T=R?=`36biP{6$bdG$ADpnp57(WteKvqhmhg2We)Da;P zY0-JLK=7akJsT(A9g0+q;?fgD^lDwHYU1K<%k9#``uWEiP76;WhfU%VXm}@D1je=F zeUj-~KAgd!;*wt@@~Iy>EfRZL*D;1k7;7X7aL6M>RWvWyO!L$T70dYzxFz$Wpu$>$ z2q9yQi@X*3P##5{NEFjxW?oNdsjrSW3bqA^#dHn7VHfRwQpzNywC8uytE=asx4hxwLuGbIy)f#Qn~&$Sk7Of(*IMr zga!W_0R`e#Mhg6W#UHbvxCrRqwCks_CKLc7`?_tkPm{mzgwebyk0LM% zKxZem8No-JI5KhF3Vpg8rCyPhB2dns2yRkr1$AOpie7T87THd$s0;4Vw~A1vcEMDY z+y|1|Yexk6L|UbR^zIfyt1KJzl^y~2ky-7HwhQrD3%bX0qpS`{m>Y{!(2{50CV63^ z=z2U)X!uRZSL*iMu(-+XbZiReToKUM^KK@5Bb_lb=KKs3T27y8Zx2#% zl1k*DjPgp$A|QTkM#w3uQ)t_v>Q}hmDGxjuof$QSIV$3(1*04UJ^_nDz#QqrcjJ<6 zdr|zvvAgFR*exsJCflUZ^UC41o44JJmqr~9bIhSc zZ)JDD!s3#4mB{Fp(!#j4ASu5iaGlX~Z?}|6h+4INlmGe!-_wV{hCrVyYSVi+>M>p+ zOf}i>Z~P9zTfxUO5jFVqO%_SzQf^JVCJ33$ z2ieKKEHFgc*agH11>T!*EtDTCwQx!a@OaiE_lLZ36F{SUvuL7KY$8f}a*&wl z-p81fNa_B&Gpb#V8aqwedwIdkFu35Xdbz)w9NFchR*GUt>}dLLZJSbhHRlc6%Tv{q z2wh33*d`e#LyWr<>REN7Z`~!jp=jA8%4GjaE2`=Tgd`q=r@l{m!ZgXFhyW54b$c09ewwh@fO*@gUlhXf!l@ya zNmjr@mQJv%&>sTS308r^ekLvu0SHyWKnV5KOMS@4FpErUN-8WQN;Cd+nkti7ww7B| z-ojcRqVUTFM;*)*Pu<46*Q_Rp#3J<|3Ra*rD}AwyfvSXcbC~8Tbu&?B@w^a(YTXPL z`)N}MV#|_dRa<8Yi~X22RP~~26$l~GI48>Nb&=^rx(Vf46-;gDLlRljw5wGQDRQlo zs0sq2H1$N$+@gKl+bP^)ee<7b132_6z&zt1ls&1BxFXau*=ANTMT$gUM zk=~N?4e&j$Lgzb^&W|&51Rcp{pV!T9)oa={a=a?*aIqX0)>MrcJF%X(fbW(zORJhX zdI6$~U#XacHPCKSW1USCsM1=tJWZ2ZD^brgd*nUW^Q!_S&9zYvfytMnv^;vP&hbO= z0oftsBDI@G)r0VeaRx<@ZbFSST)z7N-fjK!gZSu1^LdMj1#cPjLDYt+!y@-UPVBGU zPmCzIHavQ^%rDS#ckE62p#6@ai0t{LkfYlIA98!t$+P7X(`59z!%;5JD^c-j(a@?$~c9hGYyZDO8M@8M3OaWnk2Y4qtd7 z;B9%YWtrbeFCAu(dO=(`Ckgn(c#(0)UhFk3iJN=P8GaU1q#pGe-7^Z=cy9;7xWhU! z3uaqi3u={qpCC#Lf1^bs;Q`WHc|==}ZhWCVfg)rwDtP=F9-ghAR&KvU)#{t5sCB=) zGUp2tmz2(xsp*?$*q_{n$DSQJ-q|m|A5aQLJADnXu6&Q8wk$hvIGjDKFqyE;KzXozH z17W#fJz64K8$#3_Ry@MRS7QVpmwH0R0j0bx(7e2s_ZCl<=2g?F;$D(dakq3M6m;C5 zUfa3TQ;~eX_&7f}d|>8(0q94oEK3r<$LsTXyNie+-5R(}NWXe~pK#tCv6_24ef`_k z?aiqIs!=R^eQ>MuYp$eW-ujK6_tya3yb1gPm9%))g8sFeVJEiV@9408#Tz2uzizq% z8UuIMdiOjlfDPJh6n;Al-MtbG4{S{rknox;pI`P4M_LSg->aEZF51|wFt=J_yucvr z3E}k}UuR0Xi}b?{?|Q1X1Uh+2Pi=A5pL^_Io0hGr0pfN1?r+4!oC_Q!yAuATKOpok ze?;j!`q?oHiU0`(^q|n!xGbxpxQlv(0;(=xDDxl30Z0D_pTStq4*X9C0r*6mP{4@U z6nCExZZrSWimjawuHwm`N8l=CWC;I}?A0fWV@TY^K$l8s_%wQOy}@kH-^g-Lb7$;E z3?4b~mOs4;`xiEL<#uBlVggp7HzkP==;xRA^!f+X;Q<17h$Vc&Cw)R2eIkmAkEwgj zd2Q~L8FxyE{_5mBpXw0xs*0B0S;dPG>k!m(jRI!b1hXj9*o1I{(^qe%eeMu(agCOo zTMdL8C>aK&Sgspk{&twG3FA+#4l(>>#Q^wiMHvXO0!U7EGoH8wE7C91aEW!(4hpuD zvBd~}qXPbLd#i4?Ob7bh&9FC|`ph@^|GUG9GKVX>Ti9YUN#lxyncfIA5ETD}%kG|F ztg3%Lf#$T<%^oeLR?3>TUcSN%^R3H1cZN9s#b|oGwsL>% + + +
ST A
PE0
R1 = LD A, pe0

ST R1 -> A, pe1
ST 1 -> B, pe1
PE1


while (LD B, pe1 != 1) ;
put(A, A, pe1)
fence()
atomic_set(B, 1, pe1)
barrier()
wait_until(B, 1)
ST A
atomic_set(B, 1, pe1)
LD A
sb
sb
sb
sb
sb
sb
sb
sb
sb
asw
rdo
ilv
rco
rco
lco
\ No newline at end of file diff --git a/utils/defs.tex b/utils/defs.tex index fa2d0d998..89ff88ce6 100644 --- a/utils/defs.tex +++ b/utils/defs.tex @@ -69,6 +69,22 @@ \newcommand{\jth}{${\textit{j}^{\text{\tiny th}}}$} \newcommand{\kth}{${\textit{k}^{\text{\tiny th}}}$} \newcommand{\lth}{${\textit{l}^{\text{\tiny th}}}$} +% formal mem model +\newcommand{\EVENT}[1]{\textit{#1}} +\newcommand{\REL}[1]{\textit{#1}} +\newcommand{\relSB}{\textit{sequenced-before}} +\newcommand{\relMO}{\textit{modification-order}} +\newcommand{\relRF}{\textit{reads-from}} +\newcommand{\relSW}{\textit{synchronizes-with}} +\newcommand{\relHB}{\textit{happens-before}} +\newcommand{\relDR}{\textit{data race}} +\newcommand{\relILV}{\textit{implicit local visibility}} +\newcommand{\relLCO}{\textit{local completion order}} +\newcommand{\relRDO}{\textit{remote delivery order}} +\newcommand{\relRCO}{\textit{remote completion order}} +\newcommand{\relASW}{\textit{API synchronizes with}} +\newcommand{\relAHB}{\textit{API happens-before}} +\newcommand{\relADR}{\textit{API data race}} %% Generate indexed reference. \newcommand{\EnvVarIndex}[1]{\index{#1}} From 73619895f26c58b9a26e7aff984e602c3bdb7a1d Mon Sep 17 00:00:00 2001 From: John Alsop Date: Tue, 28 Apr 2026 17:50:23 -0500 Subject: [PATCH 9/9] api mem model: writing clarity and formatting consistency -improve clarity of observable access section (no loop-carried dependnecies, synch semantics of obs access are at higher level than PL) -make macro usage more consistent with rest of doc (e.g., \FUNC{put}->\PUT{}, put_signal->put-with-signal, \ac{PE}s->\acp{PE}, etc. -make apinotes consistent with rest of doc (use within description list) --- content/api_mem_model.tex | 53 +++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/content/api_mem_model.tex b/content/api_mem_model.tex index e31a9130e..12d7d1d5a 100644 --- a/content/api_mem_model.tex +++ b/content/api_mem_model.tex @@ -38,56 +38,65 @@ \subsubsection{Baseline PL-level Axiomatic Memory Model (C11)}\label{subsubsec:a Finally, the C11 specification states that, if a data race exists in a candidate execution that satisfies the above axioms, then the program's behavior is undefined. If a data race does not exist, then the program's behavior must conform to one of the conforming candidate executions. +\begin{description} \apinotes{Consume and sequential consistent memory orders are omitted from the above formalism for simplicity, but this does not impact how the PL-level semantics are extended for the API-level model. Also, instead of the verbose prose axioms defined in the C11 specification, the C11 axioms are expressed above using the more concise (and equivalent) representation outlined by Batty et al. in "Overhauling SC atomics in C11 and OpenCL" (Symposium on the Principles of Programming Languages, 2016).} +\end{description} \subsubsection{API-Level Memory Model Overview}\label{subsubsec:api_model_overview} -The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. While PL-level models operate on fine-grain loads, stores, and fences, the API-level memory model must additionally operate on coarse-grain API operations like get, \FUNC{put}, barrier, etc. that can execute asynchronously and involve multiple \ac{PE}s. +The \openshmem API-level memory model extends PL-level memory models like C11 to define how \openshmem operations interact with memory. PL-level models use acquire and release semantics to establish ordering relations between fine-grain loads, stores, and fences. The API-level memory model must additionally operate on coarse-grain API operations like \GET{}, \PUT{}, barrier, etc. that can execute asynchronously and involve multiple \acp{PE}. +Rather than defining allowed behavior by leveraging PL-level events and relations, we can more flexibly capture the higher level ordering requirements of \openshmem by defining new events and ordering relations. These can then be lowered to PL-level relations transparently to the programmer, enabling a wide range of backend implementations. We formalize allowed behavior by extending the axiomatic PL-level memory model to incorporate these new events and relations. To describe the desired \openshmem behavior, the API-level memory model makes the following additions to the existing PL-level model: \begin{enumerate} - \item\label{mcmsteps:one} \textbf{\ac{PE}s and operation events}: Introduce a new event type that corresponds to an API operation and a new memory location property that specifies the target \ac{PE}. - \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, define a set of observable memory accesses (these are not related to the initiating thread by \relSB{}). - \item\label{mcmsteps:three} \textbf{API-level relations}: Define new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. - \item\label{mcmsteps:four} \textbf{Fused collectives}: Fuse matched collective operation calls into a single event. - \item\label{mcmsteps:five} \textbf{Updated relations, axioms}: Update PL-level axioms to incorporate new events and relations. + \item\label{mcmsteps:one} \textbf{\acp{PE} and operation events}: Section~\ref{subsubsec:api_pes_operations} introduces a new event type that corresponds to an API operation and a new memory location property that specifies the target \ac{PE}. This provides a basis for operation-level semantics across multiple \ac{PE} memories. + \item\label{mcmsteps:two} \textbf{Per-operation observable accesses}: For each operation type, Section~\ref{subsubsec:api_observable} defines a set of observable memory accesses (these are not related to the initiating thread by \relSB{}). This enables asynchrony since a thread can continue execution while observable accesses execute in the background. + \item\label{mcmsteps:four} \textbf{Fused collectives}: Section~\ref{subsubsec:api_fused} fuses matched collective operation calls into a single event. This supports multi-PE collective operations and establishes ordering across \acp{PE} even when there are no observable accesses. + \item\label{mcmsteps:three} \textbf{API-level relations}: Section~\ref{subsubsec:api_relations} defines new API-level ordering relations over operation events and their observable accesses that will constrain \openshmem memory ordering. This allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. These relations are illustrated in Section~\ref{subsubsec:api_relations_example}. + \item\label{mcmsteps:five} \textbf{Updated relations, axioms}: Section~\ref{subsubsec:api_integrate} updates the PL-level memory model axioms to incorporate these new events and relations. By extending existing relations and axioms (e.g., happens-before, data races) for the API level, we establish a memory model that formalizes \openshmem ordering requirements. \end{enumerate} -Each of the above steps adds a piece of required support. \ref{mcmsteps:one} provides a basis for operation-level semantics across multiple \ac{PE} memories. \ref{mcmsteps:two} enables asynchrony since a thread can continue execution while observable accesses execute in the background. \ref{mcmsteps:three} allows the programmer to explicitly enforce ordering of asynchronous operations using the ordering and synchronization operations provided by \openshmem. \ref{mcmsteps:four} Supports multi-PE collective operations and establishes ordering across \ac{PE}s even when there are no observable accesses. \ref{mcmsteps:five} formally updates the PL-level axioms to establish a complete API-level memory model. - After defining how \openshmem extends the C11 memory model, Section~\ref{subsubsec:api_scoped} describes how the addition of scoped synchronization in GPU memory models impacts the API-level model. \subsubsection{PEs and Operation Events}\label{subsubsec:api_pes_operations} -When defining behavior at the \openshmem API level, there are two key differences in program execution relative to a PL-level model. First, \openshmem applications span multiple \ac{PE}s, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the \ac{PE}-local memory address and the \ac{PE} ID. Normal memory accesses can access memory locations on the local \ac{PE}, or on a remote \ac{PE} if \FUNC{shmem\_ptr} is used. \openshmem operations may trigger accesses to the local \ac{PE}’s memory and/or the memory of remote \ac{PE}s. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and \ac{PE} ID attributes match (impacting the definitions of of \relMO{} and \relRF{}). +When defining behavior at the \openshmem API level, there are two key differences in program representation relative to a PL-level program. First, \openshmem applications span multiple \acp{PE}, each of which has its own local memory space. Therefore, the PL-level definition of memory location is extended to include both the \ac{PE}-local memory address and the \ac{PE} ID. Normal memory accesses can access memory locations on the local \ac{PE}, or on a remote \ac{PE} if \FUNC{shmem\_ptr} is used. \openshmem operations may trigger accesses to the local \ac{PE}’s memory and/or the memory of remote \acp{PE}. Dynamic memory accesses in an execution are only considered to access the same memory location if both the local memory address and \ac{PE} ID attributes match (impacting the definitions of of \relMO{} and \relRF{}). Second, since the \openshmem API does not define the underlying PL-level implementation, API operations do not resolve to a function consisting of PL-level events. Thus, these operations must be treated as a new type of primitive in the API-level memory model: the operation event. An operation event specifies its type, and each type defines the set of minimum observable accesses triggered by the operation (described next). Operation events participate in \relSB{} and derived relations, similar to normal thread access events. Although operation events may be associated with separate memory access events (see Section~\ref{subsubsec:api_observable}), the operation events themselves do not participate directly in access-based relations like \relMO{}, \relRF{}, and \relSW{}. \subsubsection{Per-Operation Observable Accesses}\label{subsubsec:api_observable} -For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other (non-observable) accesses. Observable accesses have the same semantics as normal accesses and take part in the same relations, with one exception: there is no \relSB{} relationship between observable accesses of an API operation event and any other event issued by the same thread, except as defined later in Section~\ref{subsubsec:api_relations}. -Note, however, that the operation event that issues these observable accesses \textit{does} form an \relSB{} relationship with other events issued by the initiating thread. -In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. +For the purposes of the API memory model, each \openshmem operation type defines the minimum set of observable memory accesses performed by the operation. These accesses are considered to be ``issued'' by the associated operation. These accesses must occur as part of the operation, however the API-level model places no constraints on the thread that executes observable accesses (i.e., they may or may not be executed by the initiating thread), and it does not prevent an implementation from performing other accesses (e.g., for implementation-internal synchronization, buffering, etc. -- although data accessed in this way should never be accessed by code external to the implementation.) +Observable accesses have the same semantics as normal accesses and take part in the same relations, with two exceptions: +\begin{enumerate} + \item There is no \relSB{} relationship between observable accesses issued by an API operation and any other event in the thread that initiated the operation. Instead, Section~\ref{subsubsec:api_relations} defines the relations that constrain ordering at the API level. Note, however, that the operation event that issues these observable accesses \textit{does} form a \relSB{} relationship with other events issued by the initiating thread. + \item There are no synchronization semantics (atomic, acquire/release ordering) associated with observable accesses. Although the accesses may be lowered to PL-level accesses with these semantics, at the API level, ordering relations and data race conditions are defined a the level of the operation that issues them (see Sections~\ref{subsubsec:api_relations} and~\ref{subsubsec:api_integration}, respectively.) +\end{enumerate} +%In addition, accesses performed by an implementation that are not included in the observable set must not modify data that may be observed from application code external to \openshmem operation implementations. -Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the \ac{PE} it targets ($my\_pe$ refers to the \ac{PE} of the initiating thread). In many cases, control and compute pseudocode are included to define which memory locations are accessed, which values are stored, and which values must eventually be loaded (pseudocode rather than C11 is used here because it allows clearer specification of where memory accesses are required.) However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \relSB{} ordering between observable accesses. Within an operation, \relSB{} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses. Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses of the same operation and one of the following is true: +Table~\ref{table:api_observable} defines the observable accesses performed for each \openshmem operation. Each access denotes the memory location as well as the \ac{PE} it targets ($my\_pe$ refers to the \ac{PE} of the initiating thread). +In many cases, control and compute pseudocode are included to define which memory locations are accessed, which values are stored, and which values must eventually be loaded (pseudocode rather than C11 is used here because it allows clearer specification of where memory accesses are required.) +However, this single-threaded control flow does not constrain the actual implementation and does not necessarily imply a total \relSB{} ordering between observable accesses. Within an operation, \relSB{} is defined to exist only between observable accesses when there is a true control or data dependency between two accesses (i.e., no \relSB{} relation across independent iterations of a loop). +Specifically, $(A, B) \in sb$ only if $A$ and $B$ are observable accesses issued by the same operation and one of the following is true: \begin{itemize} \item $B$ takes a value derived from $A$ as an operand. \item $B$ has a control dependency on the result of $A$ such that $A$'s evaluation determines whether $B$ will occur. - \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in \FUNC{put\_signal[\_nbi]} where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. + \item $B$ has an ordering dependency $A$. This type of dependency currently exists only in blocking and non-blocking \emph{put-with-signal} operations where the final store to $sig\_addr$ is ordered after all prior observable accesses to source and dest. \end{itemize} Many operations have no visible accesses (e.g., memory ordering routines, whose semantics are defined by the API-level relations described in Section~\ref{subsubsec:api_relations}). +In addition, synchronization semantics for observable accesses in the API model are defined at a higher level than in the PL model. Rather than associating atomic semantics and acquire/release ordering for each observable accesses that may be involved in synchronization, we instead associate semantics at the level of the issuing operation. Specifically, API-level relations are defined based on \input{content/api_observable} \subsubsection{Fused Collectives}\label{subsubsec:api_fused} -\openshmem defines multiple collective operations that synchronize across multiple \ac{PE}s in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). +\openshmem defines multiple collective operations that synchronize across multiple \acp{PE} in a team, in some cases without any observable accesses to memory (e.g., as is the case with barrier). There are multiple requirements on the use of these operations. In particular, each dynamic call to a collective operation must be paired with a matching call from every other \ac{PE} in the team. The API-level memory model fuses these matching calls into a single event in the candidate execution. -This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of \ac{PE}s, allowing us to establish ordering relations across these \ac{PE}s (defined in Section~\ref{subsubsec:api_relations}). +This fused operation and the associated observable accesses (if any) are considered to be executed collectively by all nodes in the synchronizing set of \acp{PE}, allowing us to establish ordering relations across these \acp{PE} (defined in Section~\ref{subsubsec:api_relations}). This does not mean the observable accesses must occur atomically, and it does not constrain how the operation is implemented (i.e., any thread or \ac{PE} in the team may perform each observable access). \subsubsection{API-Level Relations}\label{subsubsec:api_relations} @@ -97,14 +106,14 @@ \subsubsection{API-Level Relations}\label{subsubsec:api_relations} \textbf{Implicit Local Visibility (ilv)}: -\relILV{} relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating \ac{PE}. This ensures, for example, that a \FUNC{put} operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: +\relILV{} relates local observable accesses from any \openshmem operation and prior hb-ordered memory accesses from the initiating \ac{PE}. This ensures, for example, that a \PUT{} operation will load an up-to-date version of data from the source buffer. Specifically, $(A, B) \in ilv$ only if all of the following are true: \begin{itemize} \item $A$ is \relHB{} ordered before the operation event that issues $B$. \item $A$ and $B$ target the same \ac{PE} (the \ac{PE} of the caller thread). \end{itemize} \textbf{Local Completion Order (lco)}: -\relLCO{} relates local observable accesses from a blocking \openshmem operation and subsequent \relHB{} ordered accesses and operations from the initiating \ac{PE}. This ensures, for example, that a blocking \FUNC{put} completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: +\relLCO{} relates local observable accesses from a blocking \openshmem operation and subsequent \relHB{} ordered accesses and operations from the initiating \ac{PE}. This ensures, for example, that a blocking \PUT{} completes its accesses to the source buffer before a subsequent store from the initiating thread overwrites the data. Specifically, $(A, B) \in lco$ only if all of the following are true: \begin{itemize} \item $A$ is an observable access issued from a blocking operation event (i.e., an operation that does not have nbi in its name). \item $A$ targets the \ac{PE} of the operation event that issues it. @@ -118,7 +127,7 @@ \subsubsection{API-Level Relations}\label{subsubsec:api_relations} \item $A$ is an observable access issued by a fence-ordered operation (defined below) that is \relHB{} ordered before $Fop$, $B$ is an observable access issued by an operation that is \relHB{} ordered after $Fop$, and $A$ and $B$ target the same \ac{PE}. \end{itemize} -In the above, "fence-ordered operations" are defined as blocking or nonblocking \FUNC{put}, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \FUNC{put\_signal} operations. +In the above, "fence-ordered operations" are defined as blocking or nonblocking \PUT{}, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \FUNC{put\_signal} operations. \textbf{Remote Completion Order (rco)}: \relRCO{} establishes full connectivity between accesses separated by an \openshmem quiet or barrier operation. All thread accesses or observable accesses that are \relHB{} ordered before the quiet or barrier operation are ordered before all thread accesses or observable accesses that are \relHB{} ordered after the quiet or barrier operation. Specifically, $(A, B) \in rco$ only if there is an \openshmem quiet or barrier operation $QBop$ and any of the following conditions are true: @@ -129,7 +138,7 @@ \subsubsection{API-Level Relations}\label{subsubsec:api_relations} \item $A$ is an observable memory access issued by a qb-ordered operation (defined below) that is \relHB{} ordered before $QBop$ and $B$ is an observable access issued by an operation that is \relHB{} ordered after $QBop$. \end{itemize} -In the above, "qb-ordered operations" are defined as blocking or nonblocking \FUNC{put}, nonblocking get, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \FUNC{put\_signal} operations. +In the above, "qb-ordered operations" are defined as blocking or nonblocking \PUT{}, nonblocking \GET{}, blocking or nonblocking \ac{AMO}, and blocking or nonblocking \emph{put-with-signal} operations. \textbf{API Synchronizes-With (asw)}: \relASW{} is the API-level analog of the PL-level \relSW{} relation. It relates observable accesses from legally synchronizing \openshmem operations that form a \relRF{} relationship. Specifically, $(A, B) \in asw$ only if $(A, B) \in rf$ and any of the following are true: @@ -153,9 +162,9 @@ \subsubsection{Example API Execution}\label{subsubsec:api_relations_example} Matched barrier calls from each \ac{PE} are fused into a single operation event. Operations are ordered within the calling thread by \relSB{}, but their observable accesses are constrained only by the added API relations, described below (in addition to \relSB{} orderings within the operation): \begin{itemize} -\item \relILV{} orders the store of $a$ in PE0 (and all prior \relHB{}-ordered accesses) with the load of $a$ from the \FUNC{put} operation. +\item \relILV{} orders the store of $a$ in PE0 (and all prior \relHB{}-ordered accesses) with the load of $a$ from the \PUT{} operation. \item \relLCO{} orders the load of $a$ in PE1 (and all subsequent hb-ordered accesses) with the load of $b$ from the (blocking) \FUNC{wait\_until} operation. -\item \relRDO{} orders the bulk store of $a$ from the \FUNC{put} operation in PE0 with the store of $b$ from the fence-separated \FUNC{atomic\_set} operation. +\item \relRDO{} orders the bulk store of $a$ from the \PUT{} operation in PE0 with the store of $b$ from the fence-separated \FUNC{atomic\_set} operation. \item \relRCO{} orders all \relHB{} ordered accesses and the accesses of \relHB{} ordered \openshmem operations before and after the fused barrier event spanning PE0 and PE1. \item \relASW{} orders the store of $b$ from the \FUNC{atomic\_set} operation in PE0 with the consuming (\relRF{} related) load of $b$ from the \FUNC{wait\_until} operation in PE1. \end{itemize}