SurveyWithCode
diff --git a/‎app/(private)/campuchia/page.md‎
Lines changed: 860 additions & 0 deletions b/‎app/(private)/campuchia/page.md‎
Lines changed: 860 additions & 0 deletions
diff --git a/‎app/(private)/dutch/page.md‎
Lines changed: 861 additions & 0 deletions b/‎app/(private)/dutch/page.md‎
Lines changed: 861 additions & 0 deletions
diff --git a/‎app/(private)/homebrew/page.md‎
Lines changed: 851 additions & 0 deletions b/‎app/(private)/homebrew/page.md‎
Lines changed: 851 additions & 0 deletions
diff --git a/‎app/(private)/laos/page.md‎
Lines changed: 861 additions & 0 deletions b/‎app/(private)/laos/page.md‎
Lines changed: 861 additions & 0 deletions
diff --git a/‎app/(private)/morocco/page.md‎
Lines changed: 0 additions & 35 deletions b/‎app/(private)/morocco/page.md‎
Lines changed: 0 additions & 35 deletions
diff --git a/‎app/(private)/swedish/page.md‎
Lines changed: 861 additions & 0 deletions b/‎app/(private)/swedish/page.md‎
Lines changed: 861 additions & 0 deletions
diff --git a/‎app/(private)/transformer/page.md‎
Lines changed: 4 additions & 4 deletions b/‎app/(private)/transformer/page.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎app/globals.css‎
Lines changed: 1 addition & 1 deletion b/‎app/globals.css‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎app/layout.jsx‎
Lines changed: 31 additions & 31 deletions b/‎app/layout.jsx‎
Lines changed: 31 additions & 31 deletions
@@ -4,41 +4,6 @@
 
 ---
 
-## جدول المحتويات
-
-- [عائلة الترانسفورمر الإصدار 2.0](#عائلة-الترانسفورمر-الإصدار-20)
-  - [جدول المحتويات](#جدول-المحتويات)
-- [الرموز](#الرموز)
-- [أساسيات الترانسفورمر](#أساسيات-الترانسفورمر)
-  - [الانتباه والانتباه الذاتي](#الانتباه-والانتباه-الذاتي)
-  - [الانتباه الذاتي متعدد الرؤوس](#الانتباه-الذاتي-متعدد-الرؤوس)
-  - [معمارية المُشفِّر-فاك التشفير](#معمارية-المُشفِّر-فاك-التشفير)
-  - [الترميز الموضعي](#الترميز-الموضعي)
-    - [الترميز الموضعي الجيبي](#الترميز-الموضعي-الجيبي)
-    - [الترميز الموضعي المُتعلَّم](#الترميز-الموضعي-المُتعلَّم)
-    - [الترميز الموضعي النسبي](#الترميز-الموضعي-النسبي)
-    - [التضمين الموضعي الدوراني](#التضمين-الموضعي-الدوراني)
-- [سياق أطول](#سياق-أطول)
-  - [ذاكرة السياق](#ذاكرة-السياق)
-  - [الذاكرة الخارجية غير القابلة للتفاضل](#الذاكرة-الخارجية-غير-القابلة-للتفاضل)
-  - [درجات الانتباه المُعززة بالمسافة](#درجات-الانتباه-المُعززة-بالمسافة)
-  - [جعله تكراريًا](#جعله-تكراريًا)
-- [النمذجة التكيفية](#النمذجة-التكيفية)
-  - [مدى الانتباه التكيفي](#مدى-الانتباه-التكيفي)
-  - [الترانسفورمر التكيفي العمق](#الترانسفورمر-التكيفي-العمق)
-- [الانتباه الفعال](#الانتباه-الفعال)
-  - [أنماط الانتباه المتفرقة](#أنماط-الانتباه-المتفرقة)
-    - [السياق المحلي الثابت](#السياق-المحلي-الثابت)
-    - [السياق ذو الخطوات الواسعة](#السياق-ذو-الخطوات-الواسعة)
-    - [مزيج من السياق المحلي والعالمي](#مزيج-من-السياق-المحلي-والعالمي)
-  - [الانتباه القائم على المحتوى](#الانتباه-القائم-على-المحتوى)
-  - [الانتباه منخفض الرتبة](#الانتباه-منخفض-الرتبة)
-- [الترانسفورمر في التعلم المعزز](#الترانسفورمر-في-التعلم-المعزز)
-- [الاستشهاد](#الاستشهاد)
-- [المراجع](#المراجع)
-
----
-
 تم اقتراح العديد من التحسينات الجديدة على معمارية الترانسفورمر منذ منشوري الأخير حول "[عائلة الترانسفورمر](https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/)" قبل حوالي ثلاث سنوات. هنا قمت بإعادة هيكلة وإثراء كبيرين لذلك المنشور الصادر عام 2020 - حيث أعدت هيكلة التسلسل الهرمي للأقسام وحسنت العديد من الأقسام بأوراق بحثية أحدث. الإصدار 2.0 هو مجموعة شاملة من الإصدار القديم، بطول يبلغ ضعف طوله تقريبًا.
 
 # الرموز
 
@@ -231,14 +231,14 @@ Compressive transformer has two additional training losses:
 1.  **Auto-encoding loss** (lossless compression objective) measures how well we can reconstruct the original memories from compressed memories
 
     $$
-    \mathcal{L}_{ac} = \| \textbf{old_mem}^{(i)} - g(\textbf{new_cm}^{(i)}) \|_2
+    \mathcal{L}_{ac} = \left\| \mathbf{old\_mem}^{(i)} - g\big(\mathbf{new\_cm}^{(i)}\big) \right\|_2
     $$
 
     where $g: \mathbb{R}^{[\frac{L}{c}] \times d} \to \mathbb{R}^{L \times d}$ reverses the compression function $f$.
 
 2.  **Attention-reconstruction loss** (lossy objective) reconstructs content-based attention over memory vs compressed memory and minimize the difference:
     $$
-    \mathcal{L}_{ar} = \|\text{attn}(\mathbf{h}^{(i)}, \textbf{old_mem}^{(i)}) − \text{attn}(\mathbf{h}^{(i)}, \textbf{new_cm}^{(i)})\|_2
+    \mathcal{L}_{ar} = \left\| \operatorname{attn}\big(\mathbf{h}^{(i)}, \mathbf{old\_mem}^{(i)}\big) - \operatorname{attn}\big(\mathbf{h}^{(i)}, \mathbf{new\_cm}^{(i)}\big) \right\|_2
     $$
 
 Transformer-XL with a memory of size $m$ has a maximum temporal range of $m \times N$, where $N$ is the number of layers in the model, and attention cost $\mathcal{O}(L^2 + Lm)$. In comparison, compressed transformer has a temporal range of $(m_m + c \cdot m_{cm}) \times N$ and attention cost $\mathcal{O}(L^2 + L(m_m + m_{cm}))$. A larger compression rate $c$ gives better tradeoff between temporal range length and attention cost.
@@ -742,7 +742,7 @@ $$
 \end{aligned}
 $$
 
-![RFA Computation Order](RFA.png)
+![RFA Computation Order](/posts/transformer-family-2/RFA.png)
 _(Left) The order of computation for default softmax operation. (Right) The order of computation when using random feature attention, a lot cheaper than default softmax. (Image source: [Peng et al. 2021](https://arxiv.org/abs/2103.02143))._
 
 **Causal Attention RFA** has token at time step $t$ only attend to earlier keys and values $\{\mathbf{k}_i\}_{i \leq t}, \{\mathbf{v}_i\}_{i \leq t}$. Let us use a tuple of variables, $(\mathbf{S}_t \in \mathbb{R}^{2D \times d}, \mathbf{z} \in \mathbb{R}^{2D})$, to track the hidden state history at time step $t$, similar to RNNs:
@@ -763,7 +763,7 @@ RFA leads to significant speedup in autoregressive decoding and the memory compl
 
 Performer modifies the random feature attention with positive random feature maps to reduce the estimation error. It also keeps the randomly sampled $\mathbf{w}_1, \dots, \mathbf{w}_D$ to be orthogonal to further reduce the variance of the estimator.
 
-![Comparison of approximation error in Performer](performer.png)
+![Comparison of approximation error in Performer](/posts/transformer-family-2/performer.png)
 _Comparison of approximation error when using (Left) i.i.d vs orthogonal features and (Right) sin/cos vs positive random features. (Image source: [Choromanski et al. 2021](https://arxiv.org/abs/2009.14794))._
 
 # Transformers for Reinforcement Learning
 
@@ -16,7 +16,7 @@
 @import "../styles/hamburger.css";
 @import "../styles/typesetting-article.css";
 
-@import "../styles/i81n/chinese.css";
+/* @import "../styles/i81n/chinese.css"; */
 
 @custom-variant dark (&:where(.dark, .dark *));
 @custom-variant light (&:where(.light, .light *));
 
@@ -40,37 +40,37 @@ const surveyWithCodeFonts = localFont({
   display: "swap",
 })
 
-const surveyWithCodeChinese = localFont({
-  src: [
-    {
-      path: "./../public/fonts/chinese/SurveyWithCodeChina-Light.woff2",
-      weight: "300",
-      style: "normal",
-    },
-    {
-      path: "./../public/fonts/chinese/SurveyWithCodeChina-Regular.woff2",
-      weight: "400",
-      style: "normal",
-    },
-    {
-      path: "./../public/fonts/chinese/SurveyWithCodeChina-Medium.woff2",
-      weight: "500",
-      style: "normal",
-    },
-    {
-      path: "./../public/fonts/chinese/SurveyWithCodeChina-SemiBold.woff2",
-      weight: "600",
-      style: "normal",
-    },
-    {
-      path: "./../public/fonts/chinese/SurveyWithCodeChina-Bold.woff2",
-      weight: "700",
-      style: "normal",
-    },
-  ],
-  variable: "--font-survey-code-chinese",
-  display: "swap",
-})
+// const surveyWithCodeChinese = localFont({
+//   src: [
+//     {
+//       path: "./../public/fonts/chinese/SurveyWithCodeChina-Light.woff2",
+//       weight: "300",
+//       style: "normal",
+//     },
+//     {
+//       path: "./../public/fonts/chinese/SurveyWithCodeChina-Regular.woff2",
+//       weight: "400",
+//       style: "normal",
+//     },
+//     {
+//       path: "./../public/fonts/chinese/SurveyWithCodeChina-Medium.woff2",
+//       weight: "500",
+//       style: "normal",
+//     },
+//     {
+//       path: "./../public/fonts/chinese/SurveyWithCodeChina-SemiBold.woff2",
+//       weight: "600",
+//       style: "normal",
+//     },
+//     {
+//       path: "./../public/fonts/chinese/SurveyWithCodeChina-Bold.woff2",
+//       weight: "700",
+//       style: "normal",
+//     },
+//   ],
+//   variable: "--font-survey-code-chinese",
+//   display: "swap",
+// })
 
 export const metadata = {
   title: "SurveyWithCode - From Research to Reproducibility",