SurveyWithCode
diff --git a/‎app/(private)/chinese/page.md‎
Lines changed: 1 addition & 1 deletion b/‎app/(private)/chinese/page.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎app/(private)/english/page.md‎
Lines changed: 7 additions & 7 deletions b/‎app/(private)/english/page.md‎
Lines changed: 7 additions & 7 deletions
@@ -426,7 +426,7 @@ Imagen 修改了 U-Net 的几个设计，使其成为*高效 U-Net*。
 
 引用为：
 
-> Weng, Lilian. (2021年7月). 什么是扩散模型？ Lil’Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
+> Weng, Lilian. (2021年7月). 什么是扩散模型？ Lil'Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
 
 或
 
 
@@ -69,7 +69,7 @@ $$
 q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_{t-1}; \tilde{\boldsymbol{\mu}}(\mathbf{x}_t, \mathbf{x}_0), \tilde{\beta}_t \mathbf{I})
 $$
 
-Using Bayes’ rule, we have:
+Using Bayes' rule, we have:
 
 $$
 \begin{aligned}
@@ -120,7 +120,7 @@ $$
 \end{aligned}
 $$
 
-It is also straightforward to get the same result using Jensen’s inequality. Say we want to minimize the cross entropy as the learning objective,
+It is also straightforward to get the same result using Jensen's inequality. Say we want to minimize the cross entropy as the learning objective,
 
 $$
 \begin{aligned}
@@ -151,7 +151,7 @@ L_\text{VLB}
 \end{aligned}
 $$
 
-Let’s label each component in the variational lower bound loss separately:
+Let's label each component in the variational lower bound loss separately:
 
 $$
 \begin{aligned}
@@ -209,7 +209,7 @@ where $C$ is a constant not depending on $\theta$.
 
 #### Connection with noise-conditioned score networks (NCSN)
 
-[Song & Ermon (2019)](/posts/diffusion-models/https://arxiv.org/abs/1907.05600) proposed a score-based generative modeling method where samples are produced via [Langevin dynamics](/posts/diffusion-models/#connection-with-stochastic-gradient-langevin-dynamics) using gradients of the data distribution estimated with score matching. The score of each sample $\mathbf{x}$’s density probability is defined as its gradient $\nabla_{\mathbf{x}} \log q(\mathbf{x})$. A score network $\mathbf{s}_\theta: \mathbb{R}^D \to \mathbb{R}^D$ is trained to estimate it, $\mathbf{s}_\theta(\mathbf{x}) \approx \nabla_{\mathbf{x}} \log q(\mathbf{x})$.
+[Song & Ermon (2019)](/posts/diffusion-models/https://arxiv.org/abs/1907.05600) proposed a score-based generative modeling method where samples are produced via [Langevin dynamics](/posts/diffusion-models/#connection-with-stochastic-gradient-langevin-dynamics) using gradients of the data distribution estimated with score matching. The score of each sample $\mathbf{x}$'s density probability is defined as its gradient $\nabla_{\mathbf{x}} \log q(\mathbf{x})$. A score network $\mathbf{s}_\theta: \mathbb{R}^D \to \mathbb{R}^D$ is trained to estimate it, $\mathbf{s}_\theta(\mathbf{x}) \approx \nabla_{\mathbf{x}} \log q(\mathbf{x})$.
 
 To make it scalable with high-dimensional data in the deep learning setting, they proposed to use either *denoising score matching* ([Vincent, 2011](/posts/diffusion-models/http://www.iro.umontreal.ca/~vincentp/Publications/smdae_techreport.pdf)) or *sliced score matching* (use random projections; [Song et al., 2019](/posts/diffusion-models/https://arxiv.org/abs/1905.07088)). Denosing score matching adds a pre-specified small noise to the data $q(\tilde{\mathbf{x}} \vert \mathbf{x})$ and estimates $q(\tilde{\mathbf{x}})$ with score matching.
 
@@ -317,7 +317,7 @@ It is very slow to generate a sample from DDPM by following the Markov chain of
 
 One simple way is to run a strided sampling schedule ([Nichol & Dhariwal, 2021](https://arxiv.org/abs/2102.09672)) by taking the sampling update every $\lceil T/S \rceil$ steps to reduce the process from $T$ to $S$ steps. The new sampling schedule for generation is $\{\tau_1, \dots, \tau_S\}$ where $\tau_1 < \tau_2 < \dots <\tau_S \in [1, T]$ and $S < T$.
 
-For another approach, let’s rewrite $q_\sigma(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0)$ to be parameterized by a desired standard deviation $\sigma_t$ according to the nice property:
+For another approach, let's rewrite $q_\sigma(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0)$ to be parameterized by a desired standard deviation $\sigma_t$ according to the nice property:
 
 $$
 \begin{aligned}
@@ -340,7 +340,7 @@ $$
 
 Let $\sigma_t^2 = \eta \cdot \tilde{\beta}_t$ such that we can adjust $\eta \in \mathbb{R}^+$ as a hyperparameter to control the sampling stochasticity. The special case of $\eta = 0$ makes the sampling process _deterministic_. Such a model is named the _denoising diffusion implicit model_ (**DDIM**; [Song et al., 2020](https://arxiv.org/abs/2010.02502)). DDIM has the same marginal noise distribution but deterministically maps noise back to the original data samples.
 
-During generation, we don’t have to follow the whole chain $t=1,\dots,T$, but rather a subset of steps. Let’s denote $s < t$ as two steps in this accelerated trajectory. The DDIM update step is:
+During generation, we don't have to follow the whole chain $t=1,\dots,T$, but rather a subset of steps. Let's denote $s < t$ as two steps in this accelerated trajectory. The DDIM update step is:
 
 $$
 q_{\sigma, s < t}(\mathbf{x}_s \vert \mathbf{x}_t, \mathbf{x}_0)
@@ -426,7 +426,7 @@ They found that noise conditioning augmentation, dynamic thresholding and effici
 
 Cited as:
 
-> Weng, Lilian. (Jul 2021). What are diffusion models? Lil’Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
+> Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
 
 Or