Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 113 additions & 52 deletions docs/source/math_description.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,106 +4,167 @@
Mathematical abstraction in UM-Bridge
=====================================

In this section, we will describe UM-Bridge's interface mathematically.
In this section, we will describe UM-Bridge's interface mathematically.

Model Evaluation
================
Let :math:`\mathcal{F}` denote the numerical model that maps the model input vector, :math:`\mathbf{x}` to
the output vector :math:`\mathbf{f(\mathbf{x})}`:
Comment thread
chun9l marked this conversation as resolved.
Let :math:`\mathbf{F}` denote the numerical model that maps the model input vector, :math:`\boldsymbol{\theta}`
to the output vector :math:`\mathbf{F}(\boldsymbol{\theta})`. We will use bold font to
indicate vectors. Note that both inputs and ouputs are required to be a list of lists in the actual
implementation. For a list of :math:`d` input vectors each with :math:`n` dimensions, we have

.. math::
\mathcal{F}\, : \,
\mathbf{x}
\mathbf{F}\, : \,
\mathbb{R}^{n \times d}
\;\longrightarrow\;
\mathbf{f}(\mathbf{x}), \quad
\mathbf{x} \in \mathbb{R}^d, \;
\mathbf{f}(\mathbf{x}) \in \mathbb{R}^n.
\mathbb{R}^{m \times d}.

The arguments ``inWrt`` and ``outWrt`` in functions, where derivatives are involved, allow the user to
select particular indices (out of :math:`d` indices) at which the derivative should be evaluated with
respect to. However, more of this will be clarified in the respective sections.

Additionally, there may be an objective function :math:`L = L(\mathbf{F}(\boldsymbol{\theta}))`.

UM-Bridge allows the following four operations.

Model Evaluation
================

Gradient Evaluation
===================
Comment thread
chun9l marked this conversation as resolved.
This is simply the so called forward map that takes an element from the list of input vectors,
:math:`\boldsymbol{\theta} = (\theta_1, \ldots, \theta_n) \in \mathbb{R}^n`, and returns the model output,
:math:`\mathbf{F}(\boldsymbol{\theta}) = (F(\boldsymbol{\theta})_1, \ldots, F(\boldsymbol{\theta})_m) \in \mathbb{R}^m`.

The ``gradient`` function evaluates the sensitivity of a scalar
objective, :math:`L(\mathbf{f}(\mathbf{x}))`, that depends on the model output, with respect to the model input. Using the
chain rule:
For a collection of :math:`d` input vectors, each of dimension :math:`n`, this follows the definition as before:

.. math::
\nabla_{\mathbf{x}}L
= \left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right)^{\!\top}

\mathbf{F} : \mathbb{R}^{n \times d} \; \longrightarrow \; \mathbb{R}^{m \times d}.

In practice, both inputs and outputs are expected as lists of lists.


Gradient of the objective function
==================================

The gradient function evaluates the sensitivity of the scalar objective. Using the chain rule:

.. math::
:name: eq:1

\nabla_{\boldsymbol{\theta}}L
= \left(\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}\right)^{\!\top}
\boldsymbol{\lambda},
\qquad
\boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{f}},
\boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{F}},

where :math:`\boldsymbol{\lambda}` is known as the sensitivity vector and
:math:`\dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}` is actually the Jacobian of the
forward map.

where :math:`\lambda` is known as the sensitivity vector.
Since there are multiple choices due to the format of the input and output, we can select a specific
component within the input (:math:`\boldsymbol{\theta}_i \in \mathbb{R}^n`) and output list of
lists (:math:`\mathbf{F}_j \in \mathbb{R}^m`). These indices are chosen using ``inWrt`` and ``outWrt``, respectively,
in the implementation.

So :ref:`(1) <eq:1>` becomes

Applying Jacobian
=================
.. math::

\nabla_{\boldsymbol{\theta}_i}
= \left( \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i} \right) ^ {\!\top}
\boldsymbol{\lambda}_j,
\qquad
\boldsymbol{\lambda}_j = \dfrac{\partial L}{\partial \mathbf{F}_j},

where :math:`\boldsymbol{\lambda}_j` is the ``sens`` argument in the code.

The output of this operation is a vector because we are essentially doing a matrix vector product.

Applying Jacobian to a vector
=============================

The ``apply_jacobian`` function evaluates the product of the model's Jacobian, :math:`J`, and a
vector, :math:`\mathbf{v}`, of the user's choice. The Jacobian of a vector-valued function
The apply Jacobian function evaluates the product of the transpose of the model's Jacobian, :math:`J^{\top}`, and a
vector, :math:`\mathbf{v}`, of the user's choice (``vec``). The Jacobian of a vector-valued function
is given by

.. math::
J =
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} =
\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} =
\left[
\begin{array}{ccc}
\dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_d}
\dfrac{\partial \mathbf{F}}{\partial \theta_1} & \cdots & \dfrac{\partial \mathbf{F}}{\partial \theta_n}
\end{array}
\right] =
\begin{pmatrix}
\dfrac{\partial f_{1}}{\partial x_{1}} & \cdots &
\dfrac{\partial f_{1}}{\partial x_{d}} \\[12pt]
\dfrac{\partial F_{1}}{\partial \theta_{1}} & \cdots &
\dfrac{\partial F_{1}}{\partial \theta_{n}} \\[12pt]
\vdots & \ddots & \vdots \\[4pt]
\dfrac{\partial f_{n}}{\partial x_{1}} & \cdots &
\dfrac{\partial f_{n}}{\partial x_{d}}
\dfrac{\partial F_{m}}{\partial \theta_{1}} & \cdots &
\dfrac{\partial F_{m}}{\partial \theta_{n}}
\end{pmatrix}
\in \mathbb{R}^{n \times d}.
\in \mathbb{R}^{m \times n}.


The output of this function for a chosen :math:`\mathbf{v} \in \mathbb{R}^{d}` is then
For a chosen :math:`\mathbf{v} \in \mathbb{R}^{n}`, this is simply

.. math::
\texttt{output}
= J\,\mathbf{v}
= \frac{\partial \mathbf{f}}{\partial \mathbf{x}}\,\mathbf{v}.
J^{\!\top}\,\mathbf{v}
= \left( \dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} \right) ^ {\!\top} \,\mathbf{v}.

Additionally, we can use this (or vice versa) to expression the ``gradient`` function by setting
:math:`\mathbf{v} = \mathbf{\lambda}`.
Additionally, we can use this to express the gradient function by setting
:math:`\mathbf{v} = \boldsymbol{\lambda}` as mentioned before.

However, as before, we can choose an index each from the input and output to construct the Jacobian such that
:math:`J_{ji} = \frac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}`. The output of this
action is then

Applying Hessian
================
.. math::
\texttt{output} =
J_{ji}\,\mathbf{v}
= \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}\,\mathbf{v},

where the :math:`i^{th}` and :math:`j^{th}` indices coresspond to ``inWrt`` and ``outWrt``.

This is a combination of the previous two sections: the output is still a matrix-vector product, but
Applying Hessian to a vector
============================

The apply Hessian action is a combination of the previous two sections: the action is still a matrix-vector product, but
the matrix is the Hessian of an objective function. The Hessian, :math:`H`, is given by

.. math::
H =
\frac{\partial^2 L}{\partial \mathbf{x}\,\partial \mathbf{x}}
= \frac{\partial}{\partial \mathbf{x}}
\frac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}}
= \frac{\partial}{\partial \boldsymbol{\theta}}
\left(
\frac{\partial \mathbf{f}}{\partial \mathbf{x}}
\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}
\right)^{\!\top}
\boldsymbol{\lambda} =
H = \begin{bmatrix}
\dfrac{\partial^2 L}{\partial x_1^2} & \dfrac{\partial^2 L}{\partial x_1 \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_1 \partial x_n} \\[18pt]
\dfrac{\partial^2 L}{\partial x_2 \partial x_1} & \dfrac{\partial^2 L}{\partial x_2^2} & \cdots & \dfrac{\partial^2 L}{\partial x_2 \partial x_n} \\[18pt]
\begin{bmatrix}
\dfrac{\partial^2 L}{\partial \theta_1^2} & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_n} \\[18pt]
\dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_2^2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_n} \\[18pt]
\vdots & \vdots & \ddots & \vdots \\[6pt]
\dfrac{\partial^2 L}{\partial x_n \partial x_1} & \dfrac{\partial^2 L}{\partial x_n \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_n^2}
\dfrac{\partial^2 L}{\partial \theta_n \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_n \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_n^2}
\end{bmatrix},

where :math:`L` is the objective function and :math:`\mathbf{\lambda}` is the sensitivity vector as defined in the ``gradient``
section.
where :math:`L` is the objective function and :math:`\boldsymbol{\lambda}` is the sensitivity vector as defined previously.

So the output for a chosen vector can be written as
So the product of :math:`H` and the chosen vector (of size :math:`n`) can be written as

.. math::
H\,\mathbf{v}
= \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}\,\partial \mathbf{x}}\,\mathbf{v} =
\left[\frac{\partial}{\partial \mathbf{x}}
= \dfrac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}}\,\mathbf{v} =
\left[\dfrac{\partial}{\partial \boldsymbol{\theta}}
\left(
\frac{\partial \mathbf{f}}{\partial \mathbf{x}}
\dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}
\right)^{\!\top}
\boldsymbol{\lambda}\right]\,\mathbf{v}.

As in the apply Jacobian action, we can select certain indices from the list of lists to construct the Hessian.
Since :math:`H` contains the second derivative of :math:`L`, we require two indices from the input:
``inWrt1`` and ``inWrt2``. The output of this action is

.. math::
\texttt{output} =
\left( \dfrac{\partial}{\partial \boldsymbol{\theta}_i}
\left[ \left( \dfrac{\partial \mathbf{F}_k}{\partial \boldsymbol{\theta}_j} \right) ^ {\!\top} \, \boldsymbol{\lambda}_k \right] \right)
\, \mathbf{v}.


Loading